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ABSTRACT 

The  combination  of  Turbo  codes  and  space-time  block 
codes  is  studied  for  use  in  CDMA  systems.  Each  user’s 
data  are  first  encoded  by  a  Turbo  code.  The  Turbo  coded 
data  are  next  sent  to  a  space-time  block  encoder  which 
employs  a  BPSK  constellation.  The  space-time  en¬ 
coder  output  symbols  are  transmitted  through  the  fading 
channel  using  multiple  antennas.  A  multistage  receiver 
is  proposed  using  non-linear  MMSE  estimation  and  a 
parallel  interference  cancellation  scheme.  Simulations 
show  that  with  reasonable  levels  of  multiple  access  in¬ 
terference  (p  <  0.3 ),  near  single  user  performance  is 
achieved.  The  receiver  structure  is  generalized  to  de¬ 
code  CDMA  signals  with  space-time  convolutional  cod¬ 
ing  and  similar  performance  is  observed. 

1.  INTRODUCTION 

Space-time  codes  [l]-[4]  use  multiple  transmit  and  re¬ 
ceive  antennas  to  achieve  diversity  and  coding  gain  for 
communication  over  fading  channels.  High  bandwidth 
efficiency  is  achieved,  with  performance  close  to  the 
theoretical  outage  capacity  [1].  Turbo  codes  [5]  are 
a  family  of  powerful  channel  codes,  which  have  been 
shown  to  achieve  near  Shannon  capacity  over  additive 
white  Gaussian  noise  channels.  Since  their  introduc¬ 
tion,  both  space-time  codes  and  Turbo  codes  have  re¬ 
ceived  considerable  attention.  In  the  CDMA2000  Ra¬ 
dio  Transmission  Technology  (RTT)  proposed  for  the 
third  generation  systems,  both  space-time  codes  and 
Turbo  codes  have  been  adopted  [6]. 

Although  papers  treating  either  just  space-time  codes 
or  Turbo  codes  abound,  jointly  considering  space-time 
codes  and  Turbo  codes  in  CDMA  systems  is  a  relatively 
new  topic.  In  this  paper,  we  initiate  a  study  on  this 
topic  where  we  focus  on  space-time  block  codes  [3]  [4], 
Our  research  develops  suboptimum  low-complexity  re¬ 
ceivers,  which  will  be  needed. 

This  paper  is  organized  as  follows.  Section  2  first 
sets  up  the  system  configuration  and  develops  the  re¬ 
ceived  signal  model.  A  brief  review  of  space-time  block 


codes  is  given  in  Section  3.  The  structure  of  our  mul¬ 
tistage  receiver  is  discussed  in  Section  4.  Section  5 
presents  simulation  results.  Conclusions  are  given  in 
Section  6. 

2.  SYSTEM  CONFIGURATION  AND 
RECEIVED  SIGNAL  MODEL 

Fig.  2  depicts  a  K  user  synchronous  CDMA  system 
with  combined  Turbo  coding  and  space-time  block  cod¬ 
ing.  There  are  N  transmit  antennas  and  M  receive  an¬ 
tennas  in  the  system.  Suppose  user  k,  k  =  1, ...,  K,  has 
a  block  of  binary  information  bits  {dk{i),i  =  1,  ■■■,  Lx) 
to  transmit.  These  bits  are  first  encoded  by  a  Turbo 
code  with  rate  Rx  =  The  bits  which  are  produced 

by  the  Turbo  encoder,  denoted  by  {dk{i),i  =  1, ...,  L2}, 
are  passed  to  a  space-time  block  encoder.  This  space- 
time  block  code  uses  a  transmission  matrix  Gn  [3]  with 
a  BPSK  constellation,  generates  N  output  bits  dur¬ 
ing  each  time  slot,  and  has  rate  R2  =  qjf.  During 
time  slot  l,  N  bits  are  transmitted,  which  are  denoted 
by  {b„k(l),  n  =  l,...,iV},  for  l  =  1  The  bit 

bnk{l)  £  {—1,4-1}  is  spread  using  a  unique  spreading 
waveform  s&(t)  and  transmitted  using  antenna  n.  For 
convenience  we  denote  the  vector  of  nth  output  bits 
from  all  K  users  as  b „(/)  =  [bni(l),  ...,bnK(l)]T ,  and 
we  note  that  all  of  these  bits  are  transmitted  by  an¬ 
tenna  n  during  time  slot  l.  We  define  the  set  of  bits 
{b„(f),  l  =  0,  ...,L  —  1}  as  one  frame  of  data. 

The  fading  coefficient  for  the  path  between  transmit 
antenna  n  and  receive  antenna  m  is  denoted  by  anm .  In 
our  research,  we  assume  a  flat  quasi-static  fading  envi¬ 
ronment  [3],  where  the  fading  coefficients  are  constant 
during  a  frame  and  are  independent  from  one  frame  to 
another.  Further  we  assume  for  simplicity  that  perfect 
estimates  of  all  fading  coefficients  are  available  at  the 
receiver.  The  received  signal  at  antenna  m  is 
N  K  L—l 

'.w  =  £  E  £  (^nmAkbnk{l)Sk(t  lT)-^-T)m(t)  (1) 

n—1  k= 1  1=0 

where  T  is  the  bit  period,  Ak  is  the  transmitted  signal 
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amplitude  for  user  k,  and  r]m(t)  is  the  complex  channel 
noise  at  receive  antenna  m.  The  received  signal  rm(t ) 
is  next  passed  through  a  matched  filter  bank,  with  each 
filter  matched  to  one  user’s  spreading  waveform.  De¬ 
note  the  matched  filter  outputs  at  receive  antenna  m 
for  the  time  slot  j  by  ym(j)  =  \ymi(j),  ■■■■,VmK{j)]T ■ 
The  equation  describing  ym(j)  can  be  represented  in 
vector  form  as 

N 

y  m(j)  =  RA  O')  +  nm0) 

n=l 

m  =  j  =  0,...,L  —  1.  (2) 

where  R  is  the  K  x  K  cross-correlation  matrix  of  the 
spreading  codes,  A  =  diag(A\,  ...,Ak),  and  nm0)  is 
the  K  x  1  complex  noise  vector  after  matched  filter¬ 
ing.  Assuming  the  channel  noise  is  Gaussian  with  zero 
mean  and  autocorrelation  function  <r2<5(r),  nm(j)  has 
a  multidimensional  Gaussian  distribution  TV(0,  cr2R). 

3.  SPACE-TIME  BLOCK  CODES 

An  extensive  discussion  of  space-time  block  codes  is 
given  in  [3]  [4].  Here  we  consider  only  TV  =  2  antenna 
cases.  Extension  to  TV  >  2  cases  is  straightforward.  A 
BPSK  space-time  block  code  with  two  transmit  anten¬ 
nas  is  described  by  the  transmission  matrix 


The  encoder  works  as  follows.  The  block  of  L2  Turbo 
coded  bits  enter  the  encoder  and  are  grouped  into  units 
of  two  bits.  Each  group  of  two  bits  are  mapped  to  a 
pair  of  BPSK  symbols  sj  and  82.  These  symbols  are 
transmitted  during  two  consecutive  time  slots.  During 
the  first  time  slot,  Si  and  s 2  are  transmitted  simultane¬ 
ously  from  antenna  one  and  two  respectively.  During 
the  second  time  slot,  -s 2  and  Si  are  transmitted  si¬ 
multaneously  from  antenna  one  and  two,  respectively. 
The  code  rate  of  C?2  is  1. 

In  [3]  [4],  the  transmission  matrix  is  designed  so  that 
the  columns  are  orthogonal  to  each  other.  This  allows 
a  simple  receiver  structure  using  only  linear  processing. 
We  illustrate  this  using  the  code  described  in  (3)  as  an 
example.  Extension  to  TV  >  2  cases  is  straightforward. 
Assuming  there  are  M  receive  antennas,  the  received 
signal  at  antenna  m  during  the  first  and  second  time 
slots,  denoted  by  ym(  1)  and  ym( 2),  are 

2/m(  1)  —  QUm^l  +  OL2mS2  T  rim(l) 

2/m(2)  =  Oi\mS2  T  oc2mS\  Tnm(2)  (4) 

where  nm(l)  and  nm( 2)  are  two  iid  complex  Gaussian 
noise  samples  with  variance  a2.  The  observations  in 


(4)  can  be  combined  to  yield  the  improved  quantities 
si  and  S2  using 

=  +  Q:2m?/rn(^) 

=  T  |Q2m|  )®1  4"  QqmJlm(l)  "h  Q;2m^'m(2) 

*2  =  a2mVm(X)  —  °1  m3/m(^) 

=  (l^lml  _t"|o!2m|  )S2  +  CK2m?Tm(l)  Oim7lm(2) 

Combining  quantities  obtained  at  each  receive  antenna 
yields 

M 

h  =  (aim2/m(l)  +  £*2m2/m(2))  =  C  SX  +  Tlx 
m— 1 
M 

^2  =  ^{almym{l)-aiimy*in{2))  =  Cs2  +  n2  (5) 

m= 1 

where 

M 

C=X;(Kn  |2  +  |a2m|2).  (6) 

m—  1 

The  Gaussian  noise  variables  ni  and  «2  have  variance 

M 

°b  =  dal™|2  +  la2m|2)  (7) 

m= 1 

It  is  easily  seen  from  (5),  (6)  and  (7)  that  after  this  sim¬ 
ple  linear  combining,  the  resulting  signals  are  equiva¬ 
lent  to  those  obtained  from  using  maximal  ratio  com¬ 
bining  [7]  techniques  for  systems  with  1  transmit  an¬ 
tenna  and  2M  receive  antennas.  This  combining  tech¬ 
nique  will  be  used  in  two  places  in  our  low-complexity 
receiver  as  discussed  in  the  next  section. 

4.  LOW-COMPLEXITY  MULTISTAGE 
RECEIVER 

The  optimum  receiver  that  minimizes  the  frame  error 
rate  should  construct  a  “super-trellis”  for  decoding. 
The  super-trellis  combines  the  trellis  of  Turbo  codes 
and  the  structure  of  the  multiuser  channel  and  space- 
time  block  codes.  Due  to  the  interleavers  used  in  the 
Turbo  codes,  it  is  very  hard  to  construct  such  a  super¬ 
trellis.  In  fact,  “optimum  decoding”  for  Turbo  codes 
alone  is  impossible  in  practice.  This  is  why  subopti¬ 
mum  iterative  decoding  schemes  are  used  to  decode 
Turbo  codes  [5].  Thus  instead  of  trying  to  find  an 
optimum  receiver,  which  would  obviously  have  a  pro¬ 
hibitively  high  complexity,  our  goal  in  this  section  is  to 
develop  a  low-complexity  suboptimum  receiver. 

We  suggest  the  multistage  receiver  structure  de¬ 
picted  in  Fig.  2.  The  output  of  the  matched  filter  bank 
is  first  passed  to  a  decorrelat.ing  detector  [8],  which 
attempts  to  eliminate  the  multiple  access  interference 
(MAI)  completely  with  perfect  estimation.  The  output 
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of  the  decorrelating  detector  at  receive  antenna  m  and 
time  slot  j  is 


N 

y m(j)  =  (RA)-1ym(i)  =  '52anmbn(j)  +  n m(j)  (8) 

n= 1 

where  we  defined  the  noise  vector  nm(j)  —  (RA)_1nm(j), 
which  has  a  Gaussian  distribution  with  covariance  ma¬ 
trix 

R  =  cr2(ARA)-1.  (9) 

The  elements  from  yi(j),  ...,  ym (j )  corresponding  to 
the  feth  user,  denoted  by  yik(j),—,yMk(j),  are  com¬ 
bined  using  the  technique  discussed  in  Section  3  to  pro¬ 
vide  improved  observations  for  user  k.  These  improved 
observations  are  sent  to  a  single  user  Turbo  decoder  to 
perform  the  first  stage  of  decoding.  The  Turbo  decoder 
produces  posterior  probabilities  for  user  fc’s  transmit¬ 
ted  bits.  These  posterior  probabilities,  together  with 
the  diversity  combined  observations,  are  used  by  a  soft 
estimator  to  form  soft  estimates  of  user  k’s  transmitted 
bits. 

The  soft  estimator  uses  non-linear  minimum  mean 
square  error  (MMSE)  estimation  [9]  to  form  the  soft 
estimates.  From  (5),  it  is  seen  that  the  diversity  com¬ 
bined  observations  for  user  k  can  always  be  represented 
in  the  form  of  y  =  Cb  +  n,  where  y  is  the  noisy  obser¬ 
vation,  b  is  the  transmitted  bit,  C  is  a  known  constant 
and  n  is  a  complex  Gaussian  noise  sample  with  vari¬ 
ance  denoted  by  a%.  The  soft  estimate  of  b  is  obtained 
by 


E{b\y} 


2Re(Cy*  ) 

Pr(b=+1)  — >  ^ 
Pr(b=- l)c 


2Re(Cy”) 

b  —  e  "b 


2fle(Cy*) 

*M»=+i)e— 

Pr{b=-l)e 


+  e 


2Re(Cy*  ) 


(10) 


where  the  prior  probabilities  Pr(b  =  ±1)  can  be  up¬ 
dated  using  the  posterior  probabilities  obtained  by  the 
Turbo  decoders. 

The  transmitted  signals  are  reconstructed  using  the 
soft  estimates  as  if  they  were  binary  digits.  Denote 
the  reconstructed  encoder  output  for  antenna  n  and 
user  k  during  time  slot  j  as  bnk{j)  and  define  bn(j)  = 
[5ni  (j), bnK(j)]T.  The  reconstructed  signals  {b „(j), 
n  =  1  ,—,N,  j  =  0,  ...,L  —  1}  are  used  in  soft  MAI 
cancellation  to  produce  “cleaner”  received  signals  for 
each  user.  To  cancel  MAI  for  user  k,  we  first  define  a 
vector  b„  (J)  equal  to  b n(j)  except  that  its  kth  ele¬ 
ment  is  zero.  The  MAI-reduced  observation  for  user  k 
at  receive  antenna  m  is  obtained  using 


N 

ym {k)(j)  =  y m(j)  -  RA  ^2  anmbj1fc)(j)  (if) 

n= 1 


When  perfect  estimate  of  b„Q')  is  available,  ym^(i) 
offers  K  different  observations  of  the  signal  from  user  k, 
contaminated  only  by  channel  noise.  For  simplicity,  we 
use  the  fcth  element  of  ym^  (j)  for  processing,  which 
gives  the  highest  SNR  for  user  k.  The  fcth  elements 
of  y m(^(i)»  m  =  1  at  all  receive  antennas  are 

combined  using  the  techniques  discussed  in  Section  3. 
The  improved  observations  are  passed  to  another  set  of 
Turbo  decoders  to  perform  the  second  stage  of  decod¬ 
ing.  These  Turbo  decoders  produce  the  final  “hard” 
decisions  on  each  user’s  transmitted  bits. 


5.  SIMULATION  RESULTS 

Monte  Carlo  simulations  are  carried  out  to  study  the 
performance  of  the  proposed  multistage  receiver.  Con¬ 
sider  a  4  user  synchronous  CDMA  system  with  2  trans¬ 
mit  antennas  and  2  receive  antennas.  Each  user’s  bits 
are  first  encoded  by  a  rate  1/3  Turbo  code  with  con¬ 
straint  length  v  =  5  and  generator  23,  35  (octal  form). 
The  random  interleaver  chosen  for  the  Turbo  code  has 
length  128.  The  block  of  Turbo  coded  data  is  encoded 
using  a  space-time  block  code  with  the  code  matrix 
t/2  from  (3)  and  a  BPSK  constellation.  Next  the  out¬ 
put  bits  are  spread  using  each  user’s  spreading  wave¬ 
form  and  the  results  are  transmitted  using  2  antennas 
over  the  fading  channel.  The  path  gains  are  modeled 
as  samples  of  independent  complex  Gaussian  random 
variables  with  variance  0.5  per  dimension  (real  or  imag¬ 
inary).  Quasi-static  fading  is  assumed.  For  the  CDMA 
channel,  we  use  the  symmetric  channel  model  where 
the  cross-correlation  between  all  pairs  of  two  users  is 
the  common  value  p.  The  SNR  for  user  k  is  defined  as 


SNRk  = 


NAk 

a2RiR2 


(12) 


Fig.  3  gives  the  BER  performance  of  the  proposed 
multistage  receiver  in  Gaussian  noise  when  all  users 
have  the  same  power  (A  =  I).  The  BER  performance 
for  the  first  stage  and  second  stage  decoding  are  both 
plotted,  which  we  denote  by  “51”  and  “52”  on  the 
graph.  For  comparison,  we  also  give  the  single  user 
performance,  which  is  the  Turbo  code  performance  for 
the  fading  channel  under  consideration.  The  perfor¬ 
mance  of  the  space-time  block  code  using  Q2  without 
the  Turbo  coding  is  also  shown.  For  p  =  0.1,  single  user 
performance  is  nearly  achieved  after  just  the  first  stage 
decoding.  The  second  stage  decoding  curve  is  indis¬ 
tinguishable  from  that  of  the  single  user  performance. 
For  p  =  0.3,  the  performance  improvement  obtained 
by  employing  the  second  stage  of  decoding  is  obvious 
from  Fig.  3b.  After  the  second  stage  decoding,  single 
user  performance  is  approached.  By  combining  a  Turbo 
code  with  a  space-time  block  code,  a  performance  gain 


3 


of  about  2.5dB  is  achieved  at  BER=10-4  compared  to 
using  a  space-time  block  code  only. 

An  iterative  receiver  structure  can  be  easily  con¬ 
structed  by  feeding  back  the  posterior  information  ob¬ 
tained  after  the  second  stage  decoding  to  the  soft  es¬ 
timators.  We  have  carried  out  simulations  using  this 
iterative  structure,  but  results  show  that  the  improve¬ 
ment  over  the  second  stage  of  decoding  is  marginal.  In 
Fig.  3b,  we  plot  the  BER  performance  for  the  second 
iteration  of  the  “iterative  receiver”  (denoted  by  “Ite 
2”),  which  is  almost  indistinguishable  from  the  second 
stage  decoding  curve.  Thus  the  extra  computations 
incurred  by  the  iterative  structure  are  not  justified. 

Next  we  study  the  performance  of  our  receiver  in 
a  near-far  situation  where  two  users  are  20dB  stronger 
than  the  other  two  users,  all  other  parameters  remain 
the  same  as  in  Fig  3.  The  BER  performance  for  the 
strong  user  and  weak  user  are  given  in  Fig.  4a  and  4b 
respectively.  The  performance,  for  both  the  weak  and 
strong  users,  approaches  single  user  performance  after 
the  second  stage  decoding. 

Finally,  we  point  out  that  the  received  signal  model 
in  (2)  is  also  valid  for  a  CDMA  system  with  space-time 
convolutional  coding  [1]  replacing  the  combination  of 
space-time  block  codes  and  Turbo  codes.  An  iterative 
receiver  can  be  constructed  using  the  parallel  interfer¬ 
ence  cancellation  scheme  [10].  Fig.  1  gives  the  frame 
error  rate  performance  for  the  first  two  iterations  of  the 
iterative  receiver  for  a  CDMA  system  with  space-time 
convolutional  coding.  It  is  seen  that  with  2  iterations, 
single  user  performance  is  achieved.  Another  observa¬ 
tion  is  that  the  performance  improvement  obtained  by 
employing  the  iterative  structure  is  marginal.  This  is 
consistent  with  our  previous  observations  for  the  space- 
time  block  coded  system. 

6.  CONCLUSIONS 

In  this  paper,  we  studied  the  application  of  Turbo  codes 
and  space-time  block  codes  in  CDMA  systems.  A  mul¬ 
tistage  receiver  is  proposed  using  parallel  interference 
cancellation  schemes.  Simulation  results  show  that  with 
reasonable  levels  of  MAI  (p  <  0.3),  near  single  user  per¬ 
formance  can  be  achieved.  The  receiver  developed  in 
this  paper  was  generalized  to  decode  CDMA  signals 
with  space-time  convolutional  coding  and  similar  per¬ 
formance  was  observed. 
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Figure  1:  Performance  of  the  iterative  multiuser  re¬ 
ceiver  for  CDMA  with  space-time  convolutional  coding 
[10]  with  K  —  4,  p  =  0.3,  4-PSK  S-T  code  with  rate 
2/b/s/Hz,  130  symbols  per  frame,  2  transmit  and  2  re¬ 
ceive  antennas  where  MMSE  is  used  in  the  first  stage 
decoding. 
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Figure  2:  Structure  of  our  K  user  CDMA  system  (including  our  multistage  receiver)  with  combined  Turbo  coding 
and  space-time  block  coding,  N  transmit  antennas  and  M  receive  antennas. 
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Figure  3:  Performance  of  the  multistage  receiver  for  CDMA  with  Turbo  coding  and  space-time  block  coding  with 
K=4  users,  2  transmit  and  2  receive  antennas. 
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Figure  4:  Performance  of  the  multistage  receiver  for  CDMA  with  Turbo  coding  and  space-time  block  coding  under 
a  near-far  situation  with  K=4,  p  =  0.3,  2  transmit  and  2  receive  antennas.  Two  users  are  20dB  stronger  than  the 
other  two  users. 
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ABSTRACT 

This  paper  presents  an  adaptive  multi-user  maximum  a  pos¬ 
teriori  (MAP)  decoder  for  synchronous  code  division  mul¬ 
tiple  access  (CDMA)  signals  on  fading  channels.  The  key 
idea  is  to  interpret  this  problem  as  an  optimal  filtering  prob¬ 
lem.  An  efficient  particle  filtering  method  is  then  developed 
to  solve  this  complex  estimation  problem.  Simulation  re¬ 
sults  demonstrate  the  efficiency  of  our  method. 

1  Introduction 

Code  division  multiple  access  (CDMA)  systems  have  re¬ 
ceived  much  attention  in  recent  years  [13].  For  the  case  of  a 
known  channel  with  additive  Gaussian  noise,  the  maximum 
likelihood  (ML)  optimal  receiver  was  presented  by  Verdu 
[16].  Lower-complexity  linear  receivers  have  also  been  pre¬ 
sented  in  this  case.  In  the  presence  of  unknown  fading 
channels,  the  estimation  problem  to  be  solved  is  much  more 
complex.  MMSE  linear  receivers  have  also  been  presented 
in  this  context .  However  it  turns  out  that  the  rate  of  adap¬ 
tation  for  these  linear  techniques  is  not  sufficient  to  track 
fast-fading  channels  and  more  sophisticated  approaches  are 
required.  Recently,  more  efficient  methods  have  been  pro¬ 
posed;  see  for  example  [5],  [6]  where  coupled  estimators 
combining  a  Viterbi  algorithm  and  an  MMSE  predictor  are 
presented. 

In  this  paper  we  follow  a  Bayesian  probabilistic  approach. 
A  state-space  model  is  included  to  model  explicitly  the  non¬ 
stationary  of  the  fading  channel.  This  allows  us  to  formulate 
the  problem  of  estimating  a  posteriori  symbol  probabilities 
as  a  complex  optimal  filtering  problem.  Under  assumptions 
detailed  later  on,  it  is  well  known  that  exact  computation  of 
these  probabilities  involves  a  prohibitive  computational  cost 
exponential  in  the  (growing)  number  of  observations.  Thus 
one  needs  to  perform  some  approximations. 

We  present  here  a  simulation-based  method  for  solving 
this  problem.  This  so-called  particle  filtering  method  can  be 
viewed  as  a  randomized  adaptive  grid  approximation  of  the 
posterior  distribution.  As  will  be  shown  later,  the  particles 
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(values  of  the  grid)  evolve  randomly  in  time  according  to  a 
simulation-based  rule.  The  weights  of  the  particles  are  up¬ 
dated  according  to  Bayes’  rule.  The  most  striking  advantage 
of  these  MC  particle  filters  is  that  the  rate  of  convergence  of 
the  error  towards  zero  is  independent  of  the  state  dimension. 
That  is,  the  randomization  implicit  in  the  particle  filter  gets 
around  the  curse  of  dimensionality.  Taking  advantage  of 
the  increase  of  computational  power  and  the  availability  of 
parallel  computers,  several  authors  have  recently  proposed 
such  particle  methods  following  the  seminal  paper  of  Gor¬ 
don  et  al.  [11],  see  [7],  [8]  for  a  summary  of  the  state-of- 
the-art  and  [2],  [14],  [15]  for  other  applications  in  digital 
communications.  It  has  been  shown  that  these  methods  out¬ 
perform  the  standard  suboptimal  methods. 

We  propose  in  this  paper  an  improved  particle  method 
where  the  filtering  distribution  of  interest  is  approximated 
by  a  Gaussian  mixture  of  a  large  number,  say  N,  of  compo¬ 
nents  which  evolve  stochastically  over  time  and  are  driven 
by  the  observations.  Though  it  is  rather  computationally  in¬ 
tensive,  it  can  be  easily  implemented  on  parallel  processors. 

The  rest  of  the  paper  is  organized  as  follows.  In  Section 
2,  we  state  the  model  and  the  estimation  objectives.  In  Sec¬ 
tion  3,  we  describe  particle  filtering  methods.  Finally  we 
demonstrate  the  efficiency  of  our  algorithm  in  Section  4. 

2  System  Model  and  Estimation  Objectives 
2.1  System  model 

We  follow  here  the  presentation  in  [5],  [6].  Consider  a 
synchronous  CDMA  system  with  a  single-antenna  at  the 
centralized  receiver.  The  system  has  M  users,  each  trans¬ 
mitting  using  a  know  direct  sequence  (DS)  spreading  code 
with  processing  gain  G  (i.e.  G  chips  per  symbol).  For  user 
to,  the  spreading  code  is  represented  by  the  G  x  1  vector 
sm  =  [sm,o,  •  ■  • ,  sm,G-i]T-  At  time  t,  user  to  transmits  a 
symbol  xmit  of  period  T  =  GTC,  where  Tc  is  the  chip  inter¬ 
val.  Each  chip  sm,cxm;t  is  affected  by  the  flat-fading  chan¬ 
nel  fm,k ,  represented  at  the  chip  rate  where  k  =  Gt  +  c. 
Note  that  t  is  used  as  an  index  at  the  symbol  rate,  and  k  is 
used  as  an  index  at  the  chip  rate. 
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At  the  receiver,  the  incoming  signal  is  sampled  at  the 
chip  rate  to  obtain  zu.  Assuming  a  synchronous  system,  the 
received  samples  are  given  by 

M 

Zk  =  'y  ^  ^m,  [k/G\  ^m,mod(k.G)fm}k  "b 
m= 1 

for  k  =  0, ... ,  GT  —  1.  In  vector-matrix  notation 

M 

^  ^  Wj,  (1) 

m=l 

for  t  =  0,. . . ,  T  —  1,  where  Sm  =  diag  (sm), 

zt  —  [ *Gt ,  ■■■,  2G(i+l)-l]T  wn  —  [u>Gt,  ■  •  • ,  ^G(*+l)— l]T 

is  a  vector  of  zero  mean  i.i.d.  complex  Gaussian  noise  sam¬ 
ples  with  variance  =  |E  [tUfctujj:]  =  N0/ (2TC).  We 
assume  that  the  fading  channels  fm,t  satisfy  the  following 
state-space  models 

=  Afmij_i  -(-  Bvm,(  (2) 

where  fTOio  is  assumed  distributed  according  to  a  Gaussian 
distribution  and  the  disturbance  noise  vmit  is  assumed  zero 
mean  i.i.d.  Gaussian.  We  denote  ft  =  [fxit, . . . ,  f m,t]-  The 
initial  states  fm,o,  the  sequences  vm,t  and  the  observation 
noise  wt  are  all  assumed  mutually  independent  at  any  time 
t.  Finally,  we  assume  that  the  symbols  xt  are  modeled  as 
a  first-order  (finite  state-space)  Markov  chain.  The  finite 
state-space  of  the  symbols  is  denoted  by  X. 

2.2  Estimation  objectives 

Given  the  observations  zo-t  —  (zo>  •  •  • » Zt),  all  Bayesian  in¬ 
ference  on  x0;t  =  (x0, . . . ,  xt )  and  f0:t  =  (fo, . . . ,  ft)  is 
based  on  the  posterior  distribution  p  ( xo :t,  fo:t|  zo:t)-  Here 
the  channel  coefficients  ft  are  regarded  as  nuisance  param¬ 
eters  and  integrated  out. 

Our  aim  is  to  compute  recursively  in  time  t  the  MMAP 
symbol  estimate  defined  as 

yMMAP  =arg max  p(Xt|  Zo.t) 
xt 

The  joint  distribution  p  ( xo;t  |  zo:t)  satisfies  the  following  re¬ 
cursion 

p(x0:t+l|Z0:t+l)  =  p(x0:t|z0:t) 

xP(Zt+l|ZQ:t,X0;t+i)p(xf+i|Xt) 

P  (  Zf-f-1  j  Z0:t ,  X0:f  ) 

The  likelihood  term  p  (zt+i  |  z0:i,  x0:t+i)  can  be  evaluated 
pointwise  through  the  Kalman  filter  associated  to  the  path 
xo:t+i  as  the  system  (l)-(2)  is  linear  Gaussian  conditional 
upon  zo:t.  It  is  easily  seen  that,  given  our  assumptions,  com¬ 
puting  p  ( Xo:t  |  zq:(  )  orp  (xt|  zo;t)  requires  a  computational 


cost  exponential  in  the  (growing)  number  t  of  observations. 

It  is  thus  necessary  to  develop  an  approximation  scheme. 

Efficient  batch  algorithms  have  been  developed  to  solve 
related  estimation  problems  [9]  but  they  are  of  limited  inter¬ 
est  in  a  digital  communications  framework.  Several  “classi¬ 
cal”  suboptimal  algorithms  have  also  been  proposed  to  solve 
related  problems  in  the  literature,  see  for  example  [1]  for  a 
standard  textbook  on  the  subject.  However,  these  approx¬ 
imation  methods  are  notoriously  unreliable  and  faults  are 
difficult  to  diagnose  on-line. 

3  Particle  Filtering 

In  this  paper,  we  present  an  original  particle  filtering  method 
to  solve  this  optimal  estimation  problem  . 

3.1  Perfect  Monte  Carlo  sampling 

Assume  it  is  possible  to  sample  N  i.i.d.  samples,  called  par¬ 
ticles,  {xq*J  :  i  =  1, . . . ,  N}  according  to  the  joint  distri¬ 
bution  p  (xo:t|  yi:t),  then  an  empirical  distribution  approx¬ 
imation  of  p  ( x0:t  |  yi;t)  is  given  by 

1  N 

PN  (x0;t|  Z0:t)  =  (X0:t)  • 

i—  1 

Consequently  an  approximation  of  its  marginal  p  (xt|  zo:t) 
is  given  by 

1  N 

PN  (Xt|  Zo:t)  =  JjJ25xli)  (X«) 

2=1 

that  is,  for  any  i  e  X, 

1 

pN  ( xt  =  i\  z 0:t)  =  —  ^2  <*x<o  (*)  (3) 

i—1 

and 

-MMAP  =argma xpN  (xt|  Z0:t) 
xtex 

The  estimate  (3)  is  unbiased  and  from  the  strong  law  of  large 
numbers  (SLLN),  pN  (xt  =  i\  z0;t)  —>  p(xt  =  i\  z0:t)  al¬ 
most  surely  as  N  — >  +oo.  A  central  limit  theorem  (CLT) 
holds  too.  The  main  advantage  of  Monte  Carlo  methods 
over  other  numerical  integration  methods  is  that  the  rate  of 
convergence  of p jv  (xt  =  i\ z0:t)  towards  p  ( x4  =  i\  z0:()  is 
independent  of  the  dimension  t.  Unfortunately,  it  is  not  pos¬ 
sible  to  sample  directly  from  the  distribution  p  (x0:t|  z0:t)  at 
any  t,  and  alternative  strategies  need  to  be  investigated. 

3.2  Sequential  Bayesian  Importance  Sampling 

An  alternative  solution  to  estimate  p  (xo;t|  zo:t)  consists  of 
using  the  importance  sampling  method.  Suppose  that  N 
i.i.d.  samples  {x^  :  i  =  1, . . . ,  N}  can  be  easily  simulated 
according  to  an  arbitrary  importance  distribution  7r(  xo:t  |  zq -t), 
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such  thatp(xo:f|  zo:  t)  >  0  implies  7r(xo;t|  z0:t)  >  0.  Using 
this  distribution  a  Monte  Carlo  estimate  of  p  (xt|  zo:t)  may 
be  obtained  as 

Pn  (xt  =  i\  7.0, t)  =  J2i= i  4!^xf)  (*) .  (4) 


where  Wg’j  oc  w;(xq*|)  (£^j  w^t  =  1),  is  the  normalised 
version  of  the  importance  weight  tu(x^)  defined  as 


w(x0:t)  X 


P(X0:t 

z0:t) 

z  0:t) 

According  to  the  SLLN,  pjv  (xt  =  i\zo,t)  converges  almost 
surely  towards  p  (xt  =  i\  zo,t)  as  N  — >  +oo,  and  under  ad¬ 
ditional  assumptions  a  CLT  also  holds. 

The  method  described  up  to  now  is  a  batch  method. 
In  order  to  obtain  the  estimate  of  p(x0;t|  z0:t)  sequentially, 
one  should  be  able  to  propagate  this  estimate  in  time  with¬ 
out  modifying  subsequently  the  past  simulated  trajectories 
{xqZj  :  i  =  1, . . . ,  N}.  This  means  that  7r(xo;i|  zo ,t)  should 
admit  7r(x0;t-i|  zo:t-i)  as  marginal  distribution: 

7r(xo-.t|  Z 0,t)  =  7r(x0:t_l|  Z0:t_l)7r(xt|  Z0:*,X0:t-l), 

and  the  importance  weights  w(x0:t)  can  then  be  evaluated 
recursively,  i.e. 


w(x0:t)  =  iu(x0:t_i)  x  wt,  (5) 


where 

p(xt|z0:t,x0:t_i) 
m  =  — i - r- 

7r(xt|zo;t,x0:f_i) 

There  are  an  unlimited  number  of  choices  for  the  impor¬ 
tance  distribution  7r  (xo,t|  zo:t),  the  only  restriction  being 
that  its  support  includes  that  of  p  (x0:t|  zo:t)-  A  sensible 
selection  criterion  is  to  choose  a  proposal  that  minimises 
the  variance  of  the  importance  weights  given  x0;t-i  and 
zo;t-  The  importance  distribution  that  satisfies  this  condi¬ 
tion  is  7r(xt|  z0:t,x0:t-i)  =  p(xt|z0:t,x0:t-i),  and  this 
“optimal”  importance  distribution  is  employed  throughout 
the  paper  (see  [7]  for  details). 

3.3  Selection  step 

For  importance  distributions  of  the  form  specified  by  (5) 
the  variance  of  the  importance  weights  can  only  increase 
(stochastically)  over  time  [7].  It  is  thus  impossible  to  avoid 
a  degeneracy  phenomenon.  Practically,  after  a  few  itera¬ 
tions  of  the  algorithm,  all  but  one  of  the  normalised  im¬ 
portance  weights  are  very  close  to  zero,  and  a  large  com¬ 
putational  effort  is  devoted  to  updating  trajectories  whose 
contribution  to  the  final  estimate  is  almost  zero.  To  avoid 
this,  it  is  of  crucial  importance  to  include  a  selection  step 
in  the  algorithm,  the  purpose  of  which  is  to  discard  particles 


with  low  normalised  importance  weights  and  multiply  those 
with  high  normalised  importance  weights.  The  weights  of 
the  “surviving”  particles  are  reset  to  1  /N.  A  selection  pro¬ 
cedure  associates  with  each  particle,  say  Xg*t,  i  —  1, . . . ,  N, 
a  number  of  children  Ni  <E  N,  such  that  AT*  =  N,  to 
obtain  N  new  particles  {x[,2:j  :  i  —  1, . . . ,  N}.  If  Ni  =  0 
then  Xq’;{  is  discarded,  otherwise  it  has  Nt  children  at  time 
t  +  1.  In  this  paper,  the  selection  step  is  done  according 
to  a  stratified  sampling  scheme  [12],  though  other  methods 
such  as  sampling  importance  resampling  (SIR)  [11]  may  be 
employed.  The  stratified  sampling  scheme  proceeds  as  fol¬ 
lows:  generate  N  points  equally  spaced  in  the  interval  [0,1], 
and  associate  for  each  particle  i,  a  number  of  children  Ni 
equal  to  the  number  of  points  lying  between  the  partial  sums 
of  weights  <7i_i  and  qt,  where  qt  =  = 


\^N  ~(i) 

|£i=i w: 


i-i 


w(t3)).  This  algorithm  is  such  that  E  [Ni] 


Nw^  and  var[Ni]  =  jiVto^  j  ^1  —  jiVto^  where. 


for  any  a,  [aj  is  the  integer  part  of  a  and  {a}  =  a  -  [aj . 


3.3.1  Algorithm 


Given  at  time  t  —  1,  N  6  N*  random  samples  Xq^_j  (i  = 
1  ,...,JV)  distributed  according  to  p(x0;t-i| z0:t_i),  the 
MC  filter  proceeds  as  follows  at  time  t. 


Particle  Filtering  Algorithm 

Sequential  Importance  Sampling  step 

•  Fori  =  1,...,JV,  sample  x[8)  ~  7r(xt| z0:* , x^t_j) 
and  x§!{  =  (x&l.i.xj0). 

•  For  i  =  1, ...,  N,  evaluate  the  importance  weights 
up  to  a  normalising  constant: 


(i)  Pl 

(zil  z0:t-i,: 

K0:t  J 

IpI 

(xl« 

S&) 

wt  oc  — 

Zo  :t 

yW  \ 

and  normalise  them  ui\l)  oc  w[l\  J2j=i  wtf>  =  1- 
Selection  step 

•  Multiply/Discard  particles  (x£t5  *  =  1>  •  •  ■  > N)  with 
respect  to  high/low  normalised  importance  weights 
w[l)  to  obtain  N  particles  (x£J;  i  =  1, . . . ,  N^. 


Clearly,  the  computational  complexity  of  the  proposed 
algorithm  at  each  iteration  is  O  (N).  Moreover,  since  the 
optimal  and  prior  importance  distributions  7r(  x  1 1  zo-.t ,  xo:«- 1 ) 
and  the  associated  importance  weights  depend  on  xo:t-i  via 
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a  set  of  low-dimensional  sufficient  statistics,  only  these  val¬ 
ues  need  to  be  kept  in  memory  and,  thus,  the  storage  re¬ 
quirements  for  the  proposed  algorithm  are  also  O  ( N )  and 
do  not  increase  over  time. 

3.3.2  Convergence  Results 

The  following  proposition  is  a  straightforward  consequence 
of  Theorem  1  in  [4],  which  itself  is  an  extension  of  results 
in  [3]. 


Proposition  1  For  all  t  >  0,  there  exists  ct  independent  of 
N  such  that 


E 


(Pn  (xt  =  i|z0:t)  -p(xt  =  i|z0:t)) 


The  expectation  operator  is  with  respect  to  the  randomness 
introduced  in  the  particle  filtering  method.  Though  the  par¬ 
ticles  are  interacting,  one  observes  that  one  keeps  the  “stan¬ 
dard”  rate  of  convergence  of  Monte  Carlo  methods. 

4  Simulation  Results 


We  demonstrate  the  performance  of  our  multi-user  MAP  de¬ 
coder  for  transmission  of  binary-shift-keyed  (BPSK)  sym¬ 
bols  over  fast  fading  CDMA  channels.  The  simulation  pa¬ 
rameters  were  as  follows:  M  =  3,  G  =  10  and  a  flat  fading 
channel  with  fading  rate  0.05/T.  We  compared  our  results 
with  [6]  and  the  case  where  the  channel  is  assumed  known 
exactly.  The  results  in  terms  of  Bit  Error  Rate  (BER)  are 
presented  in  Fig.  1.  We  notice  that  when  the  SNR  is  large, 
our  stochastic  algorithm  outperforms  substantially  that  of 

[6].  Their  deterministic  algorithm  can  indeed  get  trapped  in 
severe  local  maxima  as  the  posterior  distribution  is  peakier. 


Figure  1:  Dotted  line  +  (channel  known),  solid  line  (particle 
filtering),  dotted  line  x  ([6]) 
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ABSTRACT 

In  this  paper  we  investigate  a  blind  channel  estimation 
method  for  Multi-Carrier  CDMA  systems  that  uses 
a  subspace  decomposition  technique.  This  technique 
exploits  the  orthogonality  property  between  the  noise 
subspace  and  the  received  user  codes  to  obtain  a  chan¬ 
nel  identification  algorithm.  In  order  to  analyze  the 
performance  of  this  algorithm,  we  derived  a  theoretical 
expression  of  the  estimation  MSE  using  a  perturbation 
approach.  This  expression  is  compared  with  the  numer¬ 
ical  results  of  some  computer  simulations  to  illustrate 
the  validity  of  the  analysis. 

1.  INTRODUCTION 

Multi-Carrier  (MC)  transmission  methods  for  Code  Di¬ 
vision  Multiple  Access  (CDMA)  communication  sys¬ 
tems  have  been  recently  proposed  as  an  efficient  tech¬ 
nique  to  combat  multipath  propagation  and  have  gained 
an  increased  interest  during  the  last  years  [1,  2].  In 
these  techniques  each  user  is  assigned  to  a  unique  iden¬ 
tification  code  sequence  and  the  transmitted  signal  is 
split  in  different  subcarriers.  It  is  assumed  that  the 
subcarrier  bandwidth  is  smaller  than  the  channel  co¬ 
herence  bandwidth  and,  therefore,  presents  only  flat 
fading.  As  a  consequence,  MC-CDMA  systems  do  not 
suffer  from  Inter-Symbol  Interference  (ISI).  However, 
the  effects  of  dispersive  channels  appear  as  random  dis¬ 
tortions  in  the  amplitude  and  phase  of  each  subcarrier. 
This  causes  a  loss  of  orthogonality  between  user  codes 
and  introduces  Multiple  Access  Interference  (MAI). 

In  order  to  implement  a  multiuser  detector  and  to 
reduce  MAI  it  is  necessary  to  characterize,  implicitly  or 
explicitly,  the  channel  parameters.  In  this  paper  we  in¬ 
troduce  a  new  blind  channel  estimation  technique  that 
is  based  on  a  subspace  decomposition  [3]  and  derive 
a  particular  algorithm  to  identify  the  channel  parame¬ 
ters.  We  also  obtain,  using  perturbation  techniques,  an 

This  work  has  been  supported  by  FEDER  (grant  1FD97- 
0082). 


approximate  expression  of  the  estimation  Mean  Square 
Error  (MSE)  achieved  with  the  proposed  algorithm. 

The  paper  is  organized  as  follows.  Section  2  presents 
the  signal  model  of  a  synchronous  MC-CDMA  system. 
Section  3  describes  the  subspace  decomposition  tech¬ 
nique  and  the  resultant  algorithm.  In  section  4  we  per¬ 
form  the  theoretical  analysis  of  the  estimation  MSE. 
Section  5  shows  the  results  of  several  computer  simula¬ 
tions  that  illustrate  the  validity  of  the  approximations 
in  the  previous  section  and,  finally,  Section  6  is  devoted 
to  the  conclusions. 


2.  SIGNAL  MODEL 

Let  us  consider  a  discrete-time  baseband  equivalent 
model  of  a  synchronous  MC-CDMA  system  with  N 
users  using  L-chip  signature  codes.  The  fc-tli  chip  cor¬ 
responding  to  the  n-th  symbol  transmitted  by  the  i-th 
user  is  given  by 


Figure  1:  Block  diagram  of  the  discrete-time  baseband 
model  of  a  MC-CDMA  system. 
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3.  SUBSPACE  DECOMPOSITION 


«£(*)  =  «»*(*)  fc  =  0,  •  •  • ,  L  —  1  ra  =  0, 1,2,  •  •  •  (1) 

where  c,(fc)  is  the  fc-th  chip  of  the  i-th  user  code.  In 
a  MC-CDMA  system  the  modulator  computes  the  L- 
IDFT  (Inverse  Discrete  Fourier  Transform)  of  (1)  to 
obtain  the  following  multicarrier  signal 

1  i_1 

V‘(m)  =  IDFT[vi(k )]  =  -  £  vl{k)e^km  (2) 

^  k—0 

This  signal  is  transmitted  through  a  dispersive  channel 
with  an  impulse  response  hj(m);  m  =  0 ,...M  —  1.  At 
the  receiver  the  observed  signal  is  a  superposition  of 
the  signals  corresponding  to  N  users  plus  an  additive 
white  Gaussian  noise  (AWGN).  Therefore,  the  received 
signal  for  the  n-th  symbol  is  the  following 

N 

Xn(m)  =  ^2  Vn(m)  *  hi(m)  +  rn{m )  (3) 

i= 1 

where  *  denotes  discrete  convolution  and  rn(m )  repre¬ 
sents  a  white  noise  sequence. 

To  recover  the  transmitted  symbols,  the  receiver 
applies  a  L-DFT  (Discrete  Fourier  Transform)  to  the 
received  signal  (3).  Assuming  perfect  synchronization 
and  a  sufficiently  large  guard  time  between  symbols, 
the  resultant  signal  is 

N 

xn(k)  =  DFT[Xn(m)}  =  J2<(k)Hi(k)  +  »  »(*)  (4) 

i—  1 

N 

=  ^2slnCi{k)Hi(k)  +  ,  n(k)  k  =  0,  -  ■  ■  ,L  —  1 

i=  1 

where  Hi(k)  and  ,  n(k)  are  the  DFT’s  of  /i,(m)  and 
rn(m),  respectively.  Rewriting  (4)  in  vector  notation 
we  obtain 

N 

=  [£n(0),  •  ■  •  ,Xn(L  —  1)]T  =  ^2  snCjH,  +  r„ 

i— 1 

N  N 

=  ^2  slCiFhi  +  rn  =  8%  +  rn  (5) 

i— 1  i= 1 

where  T  denotes  transposition,  C,  is  a  diagonal  matrix 
whose  elements  are  the  L  chips  of  the  code  correspond¬ 
ing  to  the  i-th  user,  H,  =  [/?,( 0),  ■  •  •  ,Hi(L  —  1)]T  and 
Tn  =  [,  n(0),  •  •  • , ,  n(L  -  1)]T.  To  obtain  (5)  we  have 
used  the  relationship  Hj  =  Fh,  where  F  is  a  L  x  M 
DFT  matrix  and  h,  =  [h,(0),  ■  •  ■ ,  h,(M  -  1)]T.  Note 
that  (5)  is  a  CDMA  signal  where  the  code  associated 
to  the  *-th  user  is  c*  =  CjFh,. 


Assuming  statistical  independence  between  users  and 
noise,  the  autocorrelation  matrix  of  the  observations 
vector  (5)  can  be  decomposed  as 

N 

R  =  E[xnx^]  =  ^ciE[44*]cf  +  E[T„r^] 

i—  1 

=  Y2  (6) 

i—  1 

where  £?[■]  is  the  expectation  operator,  *  represents  con¬ 
jugate,  H  denotes  conjugate  transpose,  I  is  the  identity 
matrix  and  of  and  rilT  are  the  i-th  user  signal  and  noise 
power,  respectively. 

Let  us  consider  the  eigendecomposition  of  (6).  There 
are  L  eigenvalues  that  we  sort  as  Ao  >  \\  >  •  •  •  > 
\l-i-  It  is  well-known  that  the  eigenvectors  associ¬ 
ated  to  the  N  most  significants  eigenvalues  (u;,  l  = 
0,  ■  •  • ,  N  —  1)  span  the  signal  subspace  where  the  per¬ 
turbed  user  codes,  Cj,  lie.  The  remaining  L  —  N  eigen¬ 
vectors  (u;,  l  =  N,  •  ■  • ,  L  —  1)  span  the  noise  (orthogo¬ 
nal)  subspace  and  their  associated  eigenvalues  are  equal 
to  the  noise  power,  i.e.,  XN  =  ■  •■  =  \l-\  =  <4  [3]. 

As  we  have  seen,  the  perturbed  user  codes  lie  in  the 
signal  subspace  and  are  orthogonal  to  the  noise  sub¬ 
space.  This  property  can  be  used  to  state  the  following 
system  of  equations  for  the  i-th  user 

cfu,  =  0  l  —  N,-  •  •  ,L  —  1  (7) 

Recall  that  this  system  of  equations  has  M  unknowns 
and  L  —  N  equations.  It  will  be  solvable  if  and  only 
if  the  number  of  equations  is  greater  or  equal  than  the 
number  of  unknowns,  M  <  L  —  N.  This  means  that 
the  number  of  simultaneous  users,  N,  is  limited  by  the 
number  of  carriers,  L,  and  the  channel  length,  M.  Nev¬ 
ertheless,  it  is  interesting  to  note  that  the  system  ca¬ 
pacity  can  be  increased  without  increasing  the  number 
of  carriers  using  codes  with  a  length  larger  than  the 
spreading  gain  [4]. 

In  order  to  solve  the  equations  system  (7),  we  can 
consider  the  following  equivalent  system 

||cf uH|2  =  cfu/ufci  =  hf FwCf u^u/^CiFh 4  =  0  (8) 

for  l  =  N,--  ■  ,L  —  1.  The  solution  to  these  equations 
can  be  found  by  solving  the  following  minimization 
problem 

~L—1 

hi  =  arg  min  V  hf  F^Cf  u;uf  CiFh; 

Mhd|2=i 
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=  are  min  h( 


H 


X- 1 


^2  FHCfulufICiF 


U=N 


=  arg  min  hf  [FHCf  UUffCiF]  h; 
l|h,N2=l  L  J 

=  arg  min  hfQjh; 

IIMI*=i 


(9) 


where  the  solution  h,  is  an  estimation  of  the  chan¬ 
nel  impulse  response  vector,  U  is  a  L  x  (L  -  N)  ma¬ 
trix  whose  columns  are  the  eigenvectors  associated  to 
the  noise  subspace  (i.e.,  u/,  l  =  N,  •  •  •  ,L  —  1)  and 
Q i  —  FHC^\JJJHCiF.  The  solution  can  be  obtained 
by  the  least  squares  method  and  it  corresponds  to  the 
eigenvector  of  Q,  associated  to  its  minimum  eigenvalue 

[51- 

In  practice,  we  do  not  know  a  priori  the  autocorre¬ 
lation  matrix  (6).  However,  it  can  be  estimated  from 
the  sampled  matrix  as 


A  =  ^-Sx«xn  (10) 

8  n=l 

where  Ns  is  the  number  of  received  symbols  used  to 
obtain  the  estimation.  Note  that  R  ->  R  as  Ns  tends 
to  infinity  and  also  its  eigenvalues  A;  — »  A;  and  eigen¬ 
vectors  U(  ->  U;. 

Finally,  when  using  second  order  statistics,  the  chan¬ 
nel  impulse  response  can  be  obtained  up  to  a  complex 
constant.  This  constant  has  to  be  compensated  in  or¬ 
der  to  analyze  the  algorithm  performance.  Towards 
this  aim,  we  normalize  the  estimation  of  the  impulse 
response  vector  as  h i, normalized  =  where  hi(0) 

and  hi( 0)  are  the  first  elements  of  the  true  and  esti¬ 
mated  channel  impulse  response  vectors,  respectively. 


where  we  have  neglected  the  second  order  term,  AQAh  ~ 
0.  Therefore, 


QAh  ~  -AQh  (12) 

and 

Ah  ~  -QfAQh 

=  -Qf(Q-Q)h 
=  -QtQh  (13) 

where  Ql  denotes  the  left  pseudo-inverse  of  Q.  The 
fc-th  component  of  Ah  is  given  by 

A  h(k)  ~  -qfQh 

=  -qf(F"CflrUU//CF)h 
x-i 

=  -^qfFKCHu(ufCFh 

l=N 

L-l 

=  -^ufCFhqfFHCHu( 

l=N 

=  -TVacelUU^CFhqfF^C^}  (14) 

where  qfc  is  the  fc-th  column  of  (Q^)H ■  Based  on  the 
results  of  [6]  (page  1840,  equation  (4.11)),  we  obtain 
the  following  identity 

UU*  CFh  ~  -UUHAVVff  CFh  (15) 

where  V  is  a  L  x  N  matrix  whose  columns  are  the  eigen¬ 
vectors  associated  to  the  signal  subspace  (i.e.,  u;  l  = 
0,  •  •  ■ ,  N  -  1)  and  AV  =  V  -  V  .  Moreover,  from 
Appendix  A  of  [6]  (page  1844,  equation  {A. 2)) 

VH AV  ~  UhRVA_1  (16) 


4.  MEAN  SQUARE  ERROR  ANALYSIS 

In  this  section,  we  derive  an  analytical  expression  of  the 
estimation  MSE.  For  simplicity  reasons,  let  us  denote 
hj  =  h,  Qi  =  Q  and  C,  =  C.  Our  analysis  is  based  on 
a  perturbation  technique  [7]  that  allows  us  to  express 
the  perturbation  in  h,  Ah,  in  terms  of  the  perturbation 
in  Q,  AQ.  Let  us  consider  the  following  identities 

Qh  =  0 

h  =  h  +  Ah  (11) 

Q  =  Q  +  AQ 

For  a  sufficiently  large  number  of  samples  ( Ns  — >  oo), 
Q  — >  Q,  h  — >  h  and  Qh  is  approximately  equal  to  the 
zero  vector,  i.e. 


where  A  =  diag(Ao  —  o> ,  •  •  • ,  Ajv-i  -  of)  where  diag(a) 
is  a  diagonal  matrix  whose  elements  are  the  elements  of 
vector  a.  To  remove  the  effect  of  the  unknown  constant 
that  we  have  in  the  estimation  of  the  channel  vector, 
we  have  to  consider  a  normalization  of  the  vector  chan¬ 
nel  estimate.  Similarly  to  [7],  we  select  the  following 
normalization 


Ah  normalized  —  (I  ' 


hlT 

m 


)Ah 


(17) 


where  I  is  the  identity  matrix  and  1T  =  [1,0,0,  •••]. 
This  normalization  can  be  included  in  (13)  and  now  q*. 
will  be  the  fc-th  column  of  the  matrix  ((I  — 

Combining  (15)  and  (16)  in  (14),  we  obtain  the  follow¬ 
ing  expression 


Qh  =  (Q  +  AQ)(h  +  Ah)  ~  AQh  +  QAh«0  A h(k)  ~  TVacefUU^RVA-1^ V^CFhqf  FHCW} 
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L- 1 

=  ^(ufRVA-^^CFhqfF^C^u/) 

l—N 
L- 1 

=  ^ufRg,*  (18) 

l=N 


Figure  3  shows  the  simulated  and  theoretical  MSE 
versus  the  Signal  to  Noise  Ratio  (SNR)  of  the  received 
users.  The  environment  is  the  same  as  before  and  the 
curves  are  obtained  after  Ns  =  200  symbols.  We  can 
see  that  both  curves  are  very  similar  even  for  small 
values  of  SNR. 


where  glk  =  VA^1  Vf/CFhqf  F^C^u,. 

Finally,  to  obtain  the  MSE  of  the  channel  estima¬ 
tion  algorithm,  we  have  to  explore  the  fourth  order 
statistics  of  binary  and  Gaussian  random  variables.  In 
appendix  A  it  is  demonstrated  that 

£[||Ah||2]  =  (19) 

2  M—l 

=  y;  (TVace{UffUGf  CCHGk} 

8  fc=0 

+<72T>ace{UffUGfG*}) 

where  G*  =  [gArfc,  •  •  • ,  g(i,— i)*]  and  C  =  [<7iCi,  •  •  •  .ctjvCjv]. 


5.  SIMULATIONS 

In  this  section  we  compare  the  analytical  expression 
(19)  with  the  MSE  obtained  from  computer  simulations 
of  the  algorithm  (9)  to  illustrate  the  validity  of  the 
approximation  carried  out  in  the  previous  section. 

Figure  2  examines  the  accuracy  of  the  MSE  analy¬ 
sis.  It  is  shown  the  time  evolution  for  theoretical  and 
simulated  MSE  (averaged  value  of  50  realizations).  An 
environment  with  L  =  12  carriers,  a  channel  length 
M  =  4  and  8  users  received  with  a  SNR  =  12 dB  was 
considered.  It  can  be  seen  that  even  for  a  small  num¬ 
ber  of  symbols,  the  theoretical  expression  fits  to  the 
simulated  MSE. 


Figure  3:  Simulated  and  theoretical  MSE  vs.  received 
users  SNR. 


6.  CONCLUSIONS 

A  new  blind  channel  identification  method  for  Multi- 
Carrier  CDMA  systems  has  been  presented.  The  method 
exploits  the  orthogonality  between  the  signal  and  noise 
subspaces  of  the  incoming  signal.  It  also  has  been  inves¬ 
tigated  the  performance  of  the  method:  using  a  pertur¬ 
bation  technique,  we  derived  an  analytical  approximate 
expression  of  the  estimation  MSE.  Computer  simula¬ 
tions  have  revealed  the  high  accuracy  of  the  analytical 
approximation  carried  out. 


A.  APPENDIX 


Taking  into  account  that  cf*u ;  =  =  0,  it  is 

straightforward  to  obtain  from  (18)  that 


A h(k)  = 


1  L-1N.-1  (N  \ 

=  rn(4)*cf  +  r„r" 

8  l—N  n=0  \i= 1  / 


(20) 
g  Ik 


where  *  represents  conjugate.  Therefore,  the  MSE  is 


Figure  2:  Time  evolution  of  the  simulated  and  theoret¬ 
ical  MSE. 


M—l 

£[||Ah||2]=  E[Ah(k)Ah* {k)]  = 

k= 0 


(21) 
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M— 1  /  1  £— 1  £  — 1  Ns  —  1  Nt—1  N  N 

=  E  ^EEEEEE 

k— 0  \  s  l=N  p=N  n— 0  m=0  i= 1  j=l 

E[uffr„r"upgf<.cis^(4)*cfgjfe] 

1  £-1  i-1  JV.-l  /Va-1 

+  4  E  E  E  E  £I<r„rJgltg»r„r" 

s  /=JV  p—N  n—0  m—0 

where  we  have  used  the  fact  that  the  third  order  mo¬ 
ments  of  a  Gaussian  random  variable  are  zero. 

Considering  statistical  independence  between  users 
and  noise  and  the  user  symbols  i.i.d.,  the  first  expecta¬ 
tion  (21)  is 

^[uffrnr^upg^cisjn(4)*cf  gifc]  = 

=  afaZu^Upg^CiC? glk6(n  -  m)5(i  -  j)  (22) 

where  £(•)  is  the  Kronecker  function. 

The  second  expectation  in  (21)  can  be  expressed  as 

e  [ufrnr"g/*g$rmr"up]  = 

=  XI?  E{YnY”]gikg*kE[TmY"]up 
+ufr£:[rnr"]upg^.E[rmr^]gife 
=  4uf ^PSpkSikS(n  -  m)  (23) 

where  we  have  used  the  facts  u^gu-  =  0  and  E[0  [OT^^OX]  = 
£[0!02*]£;[03^]  +  Eie^WEie^}  when  i  =  1,2, 3, 4 
are  four  independent  Gaussian  variables  [7]. 

Including  (22)  and  (23)  in  (21),  it  is  obtained 
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ABSTRACT 

In  this  paper,  a  new  blind  adaptive  multiuser  detector, 
which  is  termed  prediction  least  mean  kurtosis  (PLMK) 
algorithm,  is  proposed  for  joint  MAI  and  narrowband 
interference  (NBI)  suppression  in  asynchronous  CDMA 
systems.  This  algorithm  is  based  on  a  higher-order 
statistics  rather  than  the  second-order  statistics  used  in  the 
LMS  algorithm.  Unlike  the  regular  least  mean  kurtosis 
(LMK),  it  takes  into  consideration  samples  earlier  than 
those  correspond  to  current  bit.  For  comparison  purposes, 
we  also  apply  the  regular  LMK  algorithm  to  the  case  of 
asynchronous  CDMA  systems.  Simulation  results  show 
that  the  blind  adaptive  multiuser  detector  with  PLMK 
algorithm  provides  significantly  better  performance  than 
the  one  with  regular  LMK  algorithm. 

1.  INTRODUCTION 

Blind  adaptive  multiuser  detector  has  received  significant 
attention  due  to  its  implementation  without  requiring 
training  sequences  in  CDMA  systems.  During  the  past 
several  years,  many  researches  in  this  area  have  focused 
their  effort  on  the  least  mean  square  (LMS)  algorithm  due 
to  its  low  complexity.  To  achieve  better  performance  in 
suppressing  multiple-access  interference  (MAI)  in 
synchronous  CDMS  systems,  Tang,  et  al  [3](l)  applied 
instead  the  least  mean  kurtosis  (LMK)  algorithm.  The 
LMK  algorithm  is  based  on  a  higher-order  statistics 
rather  than  the  second-order  statistics  used  in  the  LMS 
algorithm. 

In  this  paper,  a  new  blind  adaptive  multiuser  detector 


This  research  was  partially  supported  by  New  Jersey 
Center  for  Wireless  Telecommunications. 

(l)Note  that  in  [3]  only  synchronous  case  was  considered. 


termed  prediction  least  mean  kurtosis  (PLMK)  algorithm, 
is  proposed  for  joint  MAI  and  narrowband  interference 
(NBI)  suppression  in  asynchronous  CDMA  systems. 
Unlike  the  regular  LMK,  it  takes  into  consideration 
samples  earlier  than  those  correspond  to  current  bit.  For 
comparison  purposes,  we  also  apply  the  regular  LMK 
algorithm  of  [3]  to  the  case  of  asynchronous  CDMA 
systems.  Simulation  results  show  that  the  blind  adaptive 
multiuser  detector  with  PLMK  algorithm  provides 
significantly  better  performance  than  the  one  with  LMK 
algorithm. 

2.  SYSTEM  MODEL 

We  consider  the  low-pass  equivalent  model  of  an 
asynchronous  CDMA  system.  The  received  signal  due  to 
the  kth  user  is  given  by 

n  (0  =  £  sk  (t  -  iT  -  ** )  (1) 

where  T  is  the  bit  interval,  bk  e  {- 1,1}  is  the  information 
data  of  the  £th  user.  Pk  and  Tk  denote  the  power  and 
relative  delay  of  the  fcth  user,  respectively.  The  spreading 
waveform  sk  ( t )  is  given  by 

«*(')=  tak(n}//(t-nTc)  (2) 

n= 1 

where  ak  (n)e  {- 1,1}  is  the  nth  element  of  the  spreading 
sequence  for  the  klh  user,  N  is  the  processing  gain  and 
Tc  =T/N  is  the  chip  duration.  y/(t)  is  a  normalized 

rectangular  pulse  of  width  Tc ,  i.e.,  \f/2(t)dt  =  1 . 

The  total  received  signal  can  be  written  as 
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r(t)='Zlrk(t)  +  i(t)  +  n(t)  (3) 

*=  1 

where  K  is  the  number  of  users,  i(t)  is  the  NBI  and  n(t) 
is  the  white  Gaussian  noise. 

The  received  signal  r(t )  is  assumed  to  pass  through  a 
chip-matched  filter  sampled  at  chip  rate  and  synchronized 
to  chip  time.  The  /th  received  signal  sample  at  the  output 
of  the  chip-matched  filter  is 

r(0  =  l£'W'r(t)r(t-lT,)dt  (4) 

from  which  the  /th  NBI  sample  and  the  /th  white 
Gaussian  noise  sample  at  the  output  of  the  chip-matched 

filter  are  /  (/ )  =  J  ^+1  )?c  i  (/)//■(/  -  ITC  )dt  and 
«(/)  =  /;(/“)//(/  -  ITC  )dt  respectively. 

J  it  c 

In  this  paper,  we  assume  that  the  NBI  is  modeled  as  a 
pth  -order  AR  process,  i.e., 

=  j)+e(l)  (5) 

y=i 

where  e(l )  is  a  white  Gaussian  process  with  variance  £ 2 . 

3.  BLIND  PREDICTION  LMK 
ALGORITHM 

Without  loss  of  generality,  we  assume  that  the  power  and 
the  delay  of  the  desired  signal  are,  respectively,  Px  =  1 
and  T,  =  0 ,  and  convenience,  we  define  Tk  =  dkTc  where 
dk  is  integer  between  0  and  N  —  1.  In  [3],  the  LMK 
algorithm  is  based  on  the  received  signal  samples  vector 
rT  =  [  r(0),r(l),  ,r(N  -1)].  It  is  well  known  that  the 
current  value  of  NBI  is  predictable  from  its  past  values. 
Therefore,  we  expect  better  performance  by  extending  the 
received  signal  samples  vector  into  the  interval 
[■ ~MTC,T ]  (M>  0),  i.e., 

rr  ~[r{-M),r{-M  +1),  l),r(0),r(l),  ,r(AI- 1)], 

which  is  termed  PLMK  algorithm.  We  consider  the  case 
of  M  <  N  in  this  paper.  For  a  given  relative  delay  vector 

d  =  \d{ ,  ..  ,dK  ]r  ,  we  can  obtain  from  (1)~(4) 
r  =  yJFl(blal+b[a'l) 

K  - -  (6) 

+  IMai  +bWk  +bX)+i  +  n 

k= 2 


where  for  —  M  <1  <  N  —  1  and  2<k<K 


ai  (0  —  \-a\  (0]^((20) 

(7) 

a\(l)=M  +  N)]xil<0) 

(8) 

at  (0  =  0  ~  dk  )\Xuik<i<N) 

(9) 

ak  (!)  ~[ok(l  +  N—  dk  )\X(-N+di<l<dk) 

GO) 

a*  ( !)  =  \-ak  0  +  2  N  —dk  )]X(t<-N+dk) 

(ID 

with  Xa  *s  indicator  function  for  the  set  A,  bk  is  the 
current  bit  of  the  Ath  user,  b'k  and  bk  is  one  bit  or  two 
bits  earlier  than  the  current  bit  of  the  Ath  user, 
respectively. 

From  (6),  we  notice  having  3(K  -l)  +  2  =  (3K  -1) 

vectors  {Jp and  {[Fkak,4Fkak,4Fka”k\ 
k  =  2,  ,K  .  Depending  on  the  relative  delays  of  the 
multiuser  interferers,  we  have  among  these,  L 
( 2  K  <L<  3  K  —  1 )  non-zero  vectors.  For  the  L  non-zero 
vectors,  we  write  Eqn.(6)  in  the  form 

r  =  I^P*+i  +  n  (12) 

*= i 

where  the  non-zero  vector  p,  is  the  desired  signal  vector 
■y[t\, a, ,  and  bx  is  the  desired  bit.  The  set  of  non-zero 
vectors  {p2,  ,pz}  consists  of  the  intersymbol 
interference  (ISI)  {Jf\a't }  and  the  non-zero  MAI  vectors 

of  the  set  {[Fkak,4K*k’JPX\  k  =  2,  ,K. 

{b2,  ,bL}  are  data  coefficients  corresponding  to  the 
vectors  {p2,  ,pL},  respectively.  For  example,  b,  =  b[ 

if  p,  =4Fk*’k,  2<1<L,  \<k<K. 

We  use  the  following  cost  function  of  [3]  to  suppress 
interference  without  requiring  training  sequence: 

/s(h)  =  3[£(rrh)2f  -£(rrh)4  (13) 

Taking  the  gradient  with  respect  to  the  vector  h  ,  we  have 
V/fi(h)  =  12£(rrh)2£(rTh)r-4£(rTh)3r  (14) 

The  mean  value  £(rrh)2  will  be  estimated  specially  by 
recursive  equation 
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G(n)  =  fiG(n  - 1)+  (1  -  /3)[r (nf  h(«)]"  (15) 

with  0  <  p  <  1  is  forgetting  factor. 

Using  this  eastimate  and  the  ensamble  estimate  of 
£(rTh);  r(«)rh(«)  ,  we  can  get  the  following  equation 

V/B  [h(«)]  =  4^>G(n) -  [r(«)r  h(«)J  ]r(«)r  h(«)r(«) 

(16) 

Then  the  stepest  decent  adaptive  weight-update  algorithm, 
PLMK  algorithm,  can  be  characterized  by 

h(rt  +  l)=h(n)-i/t|V/Jh(«)]}  (17) 

with  VJB  [h(«)]  from  (16)  and  G(n)  from  (15).  We  can 
see  that  training  sequence  is  not  needed,  the  PLMK 
algorithm  is  blind. 

4.  SIMULATION  RESULTS 

Simulations  results  carried  out  to  evaluate  the 
performance  of  the  PLMK  algorithm  is  depicted  in  Fig.l. 
For  comparison,  we  add  to  it  the  results  with  regular 
LMK  algorithm  [3],  but  for  asynchronous  case,  which  can 
be  obtained  from  PLMK  with  M  =0.  In  this  simulation, 
we  use  a  three-user  CDMA  system  employing  Gold  Code 
of  length  7.  For  calculating  the  averaged  SIR  at  the  nth 
iteration,  we  use  expression  given  by  [2]; 

iihw'p.r 

SIR(n)  =  - - — - 

X{h(n)T[r(n)-foi(n)p1]}2 

;=i 

with  J  is  the  number  of  times  the  simulations  are 
repeated.  Each  of  the  other  CDMA  users  has  power  P 
larger  than  the  desired  CDMA  user  power  Px  =  1 .  The 

delay  vector  is  set  to  d  =  [0,l,3,6]r .  The  NBI  is  modeled 
as  a  first-order  AR  process  with  a,  =  0.99  and  power  of 
3dB  higher  than  the  desired  signal.  The  white  noise 
power  is  set  to  0.1.  We  use  M  -  3 ,  P  =  10 ,  /3  =  0.4 , 

JU  =  6xlO”4and  7=500.  From  Fig.l,  we  can  easily 
see  that  the  PLMK  algorithm  provides  significantly  better 
performance  than  the  regular  LMK  algorithm  with  almost 
the  same  convergence  rate. 


5.  CONCLUSIONS 

In  this  paper,  we  proposed  a  new  blind  adaptive 
multiuser  detector  based  on  prediction  least  mean 
kurtosis  (PLMK)  algorithm  for  joint  suppressing  MAI 
and  NBI  in  asynchronous  CDMA  systems.  For 
comparison,  we  also  apply  the  regular  LMK 
algorithm  of  [3]  to  the  case  of  asynchronous  CDMA 
systems.  Results  show  that  the  blind  adaptive 
multiuser  detector  with  PLMK  algorithm  provides 
significantly  better  performance  than  the  one  with 
regular  LMK  algorithm. 
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Nurrber  of  Iterations 


Fig.l  Averaged  output  SIR  versus  number  of  iterations 

(  N  =  1,M  =3,K  =  3) 
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ABSTRACT 

We  investigate  a  “symbol-level”  MMSE  equalizer  for  the 
CDMA  downlink  over  a  frequency-selective  multipath  chan¬ 
nel  meant  to  improve  on  the  recently  proposed  “chip-level” 
downlink  equalizers.  Indeed  the  symbol-level  equalizer  per¬ 
forms  better  than  the  chip-level,  but  is  computationally  more 
demanding.  The  symbol-level  equalizer  is  optimal  for  “sat¬ 
urated  cells”  where  all  Walsh-Hadamard  channel  codes  are 
in  use  and  have  equal  power.  It  performs  very  close  to  op¬ 
timal  even  for  relatively  lightly  loaded  cells.  We  derive  a 
bound  on  the  off-diagonals  of  the  covariance  matrix  of  the 
transmitted  data  that  helps  explain  why  the  equalizer  works 
when  there  are  fewer  active  channel  codes  than  the  spread¬ 
ing  factor.  Performance  is  evaluated  through  simulations  to 
obtain  the  average  bit  error  rate  (BER)  over  a  class  of  chan¬ 
nels  for  two  cases:  no  out-of-cell  interference,  and  one  equal 
power  base-station.  The  symbol-  and  chip-level  equalizers 
are  compared  to  the  conventional  RAKE  receiver. 


1.  INTRODUCTION 

Chip-level  downlink  equalization  is  a  good  candidate  for  im¬ 
proving  capacity  (in  terms  of  users  and/or  data  rate)  in 
3G  cellular  systems  such  as  cdma2000  [1],  These  equaliz¬ 
ers  significantly  cancel  multi-user  access  interference  (MAI), 
the  main  performance  limitation  for  the  standard  RAKE  re¬ 
ceiver.  The  good  qualities  of  the  recently  proposed  “chip- 
level  equalizers”  for  CDMA  downlink  are  that  they  need 
knowledge  only  of  the  desired  user’s  spreading  code  (and 
long-code),  they  change  only  as  often  as  the  channel  so  don’t 
need  to  be  recomputed  every  symbol,  and  the  same  equalizer 
applies  to  all  users  from  a  given  base-station.  However,  these 
equalizers  do  not  yield  the  optimal  estimate  of  the  transmit¬ 
ted  symbol. 

The  optimal  equalizer  is  conditioned  on  all  of  the  chan¬ 
nel  codes  in  use  and  their  powers,  and  also  the  base-station 
dependent  long  code.  Since  these  aren’t  really  random  quan¬ 
tities,  it  should  be  possible  to  improve  on  the  performance 
by  using  them.  One  option  approaching  the  optimal  one, 
but  still  having  the  nice  feature  of  only  needing  to  know  the 
channel  code(s)  of  the  desired  user,  is  derived  here.  We  re¬ 
fer  to  this  as  the  “symbol-level”  equalizer.  This  equalizer 
changes  every  symbol,  unlike  the  chip-level  equalizer.  We 
find  that  this  equalizer  leads  to  a  performance  improvement 
over  the  chip-level  equalizer  when  all  channel  codes  are  in 
use  and  are  equal  power  (in  which  case  the  derived  equal¬ 
izer  is  equal  to  the  optimal  symbol  estimate).  We  also  make 
some  arguments,  and  show  simulation  results,  that  show  this 
equalizer  is  applicable  when  there  are  fewer  active  channel 
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codes  per  cell. 

In  this  paper  we  derive  the  symbol-level  MMSE  estimator 
for  the  two  base-station  case.  One  base-station  transmits 
the  desired  user’s  data,  while  the  other  base-station  is  con¬ 
sidered  interference.  Spatial  diversity  and/or  oversampling 
with  respect  to  the  chip  rate  are  handled  as  multiple  chip¬ 
spaced  channels.  Our  simulations  assume  spatial  diversity  is 
provided  by  two  antennas  at  the  receiver  which  experience 
independent  fading,  and  oversample  at  twice  the  chip  rate. 

Some  relevant  papers  on  linear  chip-level  downlink  equaliz¬ 
ers  that  restore  orthogonality  of  the  Walsh-Hadamard  chan¬ 
nel  codes  and  hence  suppress  MAI  are  [2,  3,  4,  5,  6,  7,  8],  Of 
these,  [4,  7,  8]  address  antenna  arrays,  while  the  others  con¬ 
sider  a  single  antenna,  possibly  with  oversampling.  In  Ref¬ 
erence  [8]  we  compare  one  and  two  antenna  receivers.  The 
interference  from  other  base-stations  is  addressed  in  Ghauri 
and  Slock  [4],  Frank  and  Visotsky  [3],  and  by  Krauss  and 
Zoltowski  in  [7]. 

In  this  paper  the  channel  and  noise  power  are  assumed 
known  (i.e.,  channel  estimation  error  is  neglected).  Using 
the  exact  channel  in  simulation  and  analysis  leads  to  an  in¬ 
formative  upper  bound  on  the  performance  of  these  meth¬ 
ods,  but  must  be  understood  as  such.  For  adaptive  versions 
of  linear  chip  equalizers  for  CDMA  downlink  see  [3]  and  [6] 
and  some  of  the  references  in  [5].  [3,  4]  present  performance 
analysis  in  the  form  of  SINR  expressions  for  the  multiple 
base-station  case,  for  the  chip-level  equalizer.  In  [7]  Krauss 
and  Zoltowski  show  that  the  SINR  expression  along  with  a 
Gaussian  assumption  is  a  good  predictor  of  uncoded  BER 
for  BPSK  symbols  for  the  chip-level  equalizers. 

2.  DATA  AND  CHANNEL  MODEL 

The  impulse  response  for  the  i  —  th  antenna  channel,  between 
the  kth  base-station  transmitter  and  the  mobile-station  re¬ 
ceiver,  is 

Na- 1 

h\k\t)  =  [k]Prc(t  -rk)  i  =  1,  2,  k  =  1, 2  (1) 

*=o 

prc(t)  is  the  composite  chip  waveform  (including  both  the 
transmit  and  receive  low-pass  filters)  which  we  assume  has 
a  raised-cosine  spectrum.  Na  is  the  total  number  of  delayed 
paths  or  “multipath  arrivals,”  some  of  which  may  have  zero 
or  negligible  power  without  loss  of  generality. 

The  channel  we  consider  for  this  work  consists  of  Na  =  17 
equally  spaced  paths  0.625 ns  apart  (to  =  0,  ri  =  0.625ps, 
. . .);  this  yields  a  delay  spread  of  at  most  10/is,  which  is  an 
upper  bound  for  most  channels  encountered  in  urban  cellular 
systems.  We  model  the  class  of  channels  with  4  equal-power 
random  coefficients  with  arrival  times  picked  randomly  from 

the  set  {to,  ri, . . . ,  Tie};  the  rest  of  the  coefficients  [fc] 
are  zero.  For  base-station  1,  once  the  4  arrival  times  have 
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been  picked  at  random  and  then  sorted,  the  first  and  last 
arrival  times  are  forced  to  be  at  0  and  the  maximum  delay 
spread  of  lOps  respectively.  Base-station  2’s  arrival  times  are 
chosen  in  the  same  fashion  and  independent  of  base-station 
l’s,  but  without  forcing  arrivals  at  0  and  10ps.  The  coef¬ 
ficients  are  equal-power,  complex-normal  random  variables, 
independent  of  each  other.  The  arrival  times  at  antennas  1 
and  2  associated  with  a  given  base-station  are  the  same,  but 
the  coefficients  are  independent. 

The  “multi-user  chip  symbols”  for  base-station  k ,  s^[n], 
may  be  described  as 


base-station  k,  k  =  1,2.  The  equalizer  coefficients  q*k)  [n] 
comprise  the  equalizer  vector 

g(fc)=[gr  ■■■sTf  (5) 

where 

g.(fc)  -  [g\k) [0],  g(k)  [1],  •  •  • ,  9{k) [N,  -  1]]T  i  =  1, . . . ,  M.  (6) 
The  MNg  x  1  vectorized  received  signal  is  given  by 

y[n]  =  H(1)s(1)  [n]  +  H(2)s(2)  [n]  +  rj[n]  (7) 


N(uk)  N.- 1 

*(fc)N  =  c6»}N  YI  -  Ncm]  (2) 

j= 1  m=0 

where  the  various  quantities  are  defined  as  follows:  c[kJ  [n]  is 
the  base-station  dependent  long  code;  is  the  j,h  user’s 
gain;  6^[m]  is  the  jth  user’s  bit/symbol  sequence;  cj*^[n], 

n  =  0,1,...,  Nc  —  1,  is  the  jth  user’s  channel  (short)  code; 
Nc  is  the  length  of  each  channel  code  (assumed  the  same  for 

each  user);  N ^  is  the  total  number  of  active  users;  N,  is  the 
number  of  bit/symbols  transmitted  during  a  given  time  win¬ 
dow.  The  signal  received  at  the  ith  antenna  (after  convolving 
with  a  matched  filter  impulse  response  having  a  square-root 
raised  cosine  spectrum)  from  base-station  k  is 

y(ik](t)  =  y^  sW[n]h\k)(t- nTc)  *  =  1,2  (3) 

n 

where  h\k\t)  is  as  defined  in  Eqn.  (1).  The  total  received 
signal  at  the  mobile-station  is  simply  the  sum  of  the  contri¬ 
butions  from  the  different  base-stations  plus  noise: 

yi(t)  =  y,w(t)  +  y\2)(t)  +  q,(t)  *  =  1, 2.  (4) 

ydt)  is  a  noise  process  assumed  white  and  gaussian  prior  to 
coloration  by  the  receiver  chip-pulse  matched  filter. 

For  the  first  antenna,  we  oversample  the  signal  yi  (t)  in 
Eqn.  (4)  at  twice  the  chip-rate  to  obtain  yi[n]  =  yi  (nTc) 
and  y2 [n]  =  yi  (“  +  nTc).  These  discrete-time  signals  have 
corresponding  impulse  responses  =  fcj  (<)!t=nTc  and 

[n]  =  hjfc^(t)|t_rc+nI,  for  base-stations  k  =  1,2. 

For  the  second  antenna,  we  also  oversample  the  signal  j/s <(t) 
in  Eqn.  (4)  at  twice  the  chip-rate  to  obtain  2/3  [n]  =  1/2  (nTc) 
and  </4 [n]  =  2/2 +  nTc).  These  discrete-time  signals  have 
corresponding  impulse  responses  [n]  =  h^k\t)\t=nTc  and 
h{k^  [n]  =  (t)\t_T^^nT  for  base-stations  k  —  1,  2. 

Let  M  denote  the  total  number  of  chip-spaced  channels 
due  to  both  receiver  antenna  diversity  and  /  or  oversampling. 

3.  CHIP-LEVEL  EQUALIZER 

The  “Chip-level”  MMSE  equalizer  is  shown  in  Figure  1  (two 
antenna  case  with  no  oversampling).  It  estimates  the  multi¬ 
user  synchronous  sum  signal  for  either  base-station  1  or  2, 
and  then  correlates  with  the  desired  user’s  channel  code 
times  that  base-station’s  long  code.  To  derive  the  chip-level 
MMSE  equalizer,  it  is  useful  to  define  signal  vectors  and 
channel  matrices  based  on  the  equalizer  length  Ng.  The  “re¬ 
covered”  chip  signal  will  be  —  D]  =  g(*)ffy[n]  for  some 

delay  D,  where  is  the  MNg  x  1  chip-level  equalizer  for 


SW[„]  =  [*<*>[«],  s(fc)[n  -  1], . . .  ,s(k)[n  -  (Ng  +  L  -  2)]]' 


H-k)  is  the  Ng  x  (L  +  Ng  —  1)  convolution  matrix 

'  fcjk)[0]  0  ...  0 

fi|fc)[l]  fijfc)[0]  0  0 

H[fc)=  h\k)[L- 1]  h\k)[L-  2]  fcjk)[0] 

0  h\k)[L- 1]  ••  h\k)[l] 

0  0  0  h\k)  [L  -  1] 

Equation  (7)  is  more  compactly  written  as 
y[n]  =  'Hs[n]  +  1][n] 


U  =  H(1)  :  H(2) 


s[n]  =  [s^T[n]  s*2^T[n]]  . 


The  MMSE  criterion  is 

min  E{\gWH(Us[n\  +  Tj[n])  -  t5^(k)[»]|2}  (14) 

g(*) 

where  Sd  is  all  zeroes  except  for  unity  in  the  (D  +  1)  —  th 
position  (so  that  S'pS^'1  [n]  =  s^[n  —  D]). 

We  assume  unit  energy  signals,  E{|s^  [n]|2}  =  1,  and  fur¬ 
thermore  that  the  chip-level  symbols  s^k'  [n]  are  independent 
and  identically  distributed,  £{s[n]sH[n]}  =  I.  This  is  the 

case  if  the  base-station  dependent  long  codes,  c^[n],  are 
treated  as  iid  sequences,  a  very  good  assumption  in  practice. 
The  equalizer  which  attains  the  minimum  is 


g(k)  =  (««a  +  R„)  ‘h(%. 


The  MMSE  is 


MMSE 


=  l-6lH(-k)H  (MU* +  Rrw'j  'h^Sd.  (16) 
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Figure  1.  Chip  and  Symbol  MMSE  Estimators  for  kth  Base-Station,  two  antennas,  no  oversampling. 


The  MMSE  equalizer  is  a  function  of  the  delay  D.  The 
MMSE  may  be  computed  for  each  D,0<D<Ng  +  L  —  2 
with  only  one  matrix  inversion  (which  has  to  be  done  to  form 
gW  anyway).  Once  the  D  yielding  the  smallest  MMSE  is  de¬ 
termined,  the  corresponding  equalizer  g^  may  be  computed 
without  further  matrix  inversion  or  system  solving. 

4.  SYMBOL-LEVEL  EQUALIZER 

In  this  section  we  present  what  we  call  the  “symbol-level” 
MMSE  estimator.  This  estimator  depends  on  the  user  index 
and  symbol  index,  and  hence  varies  from  symbol  to  symbol. 
The  FIR  estimator  that  we  derive  here  is  a  simplified  version 
of  that  presented  in  [9]  where  in  our  case,  all  the  channels  and 
delays  from  a  given  base-station  are  the  same.  The  conclu¬ 
sions  reached  in  that  paper  apply  equally  well  here,  namely 
that  FIR  MMSE  equalization  always  performs  at  least  as  well 
as  the  “coherent  combiner”  (that  is,  the  RAKE  receiver). 
This  type  of  symbol-level  receiver  has  also  been  presented  in 
[10],  although  again  not  specifically  for  the  CDMA  downlink. 

The  symbol-level  equalizer  differs  from  the  chip-level 
equalizer  in  that  the  base  station  and  Walsh-Hadamard  codes 
do  not  appear  explicitly  in  the  block  diagram  (see  Figure  1). 
Instead,  the  codes  become  incorporated  into  the  equalizer  it¬ 
self.  To  derive  the  equalizer,  we  first  define  [n]  as  the  bit 
sequence  b\k\m]  upsampled  by  Nc:  [n]  =  bjk\  m]  when 

n  =  mNc  and  [n]  =  0  otherwise.  We  wish  to  estimate 
[m]  directly  and  we  do  this  by  finding 

min£{|a<fc)[n  -  D]  -  a$k)[n  -  D]\2}  (17) 

where  the  minimization  is  done  only  when  n  —  D  =  mNc. 
As  in  the  chip-level  case,  aSk\n  —  D]  =  g^Hy[n]  where  y[n] 
is  given  by  Eq.  (11).  Setting  n  =  mNc  +  D,  the  MSE  is 
minimized  yielding 


[n]  =  Z?{s^[n]s^H[n]).  We  assume  here  that 
user  is  only  transmitted  by  base  station  k.  We 
also  assume  that  the  base  station  and  Walsh-Hadamard 
codes  are  deterministic  and  known  so  that  the  only  ran¬ 
dom  elements  in  s[n]  are  the  transmitted  bits.  Then 

[n]s^*[m]}  =  0  for  fc  /  y  and  any  n  and  m,  so 
R^>[n]  =  R^[n]  =  0.  The  (i,j)th  element  of  R {kk)[n] 
is  S\kk)[n]  =  E{sW[n  +  1  -  +  1  -  j]}.  When  i  =  j, 

Sfkk)  [n]  =  1.  When  i  j, 

(fcMr  ,  (  BiAr>Wlk)[n],  when 

$ij  lnJ  -  \  (n  +  1  —  i)modVc  =  (n  +  1  —  j)modArc 

I  0  otherwise 


where  R^ 
the  desired 


(22) 

where 


B>,[n\  =  4»)[n+l-,']ci?*[n  +  l-j]  €  {±2,  ±2j]  V  i,j  (23) 


nW 

<’[»]  =  E4fc)  [(«  +  1  —  t)modAlc]c},*^[(n  -f  1  —  j)modAfc] 

p=i 

(24) 


g(fc)H  =  (‘HR8s[mNc  +  D]UH  +  R,,)_1Ri,sH  (18) 

where 

Rss[n]  =  £{s[n]sH[ri]}  (19) 

Rbs[m]  =  £{fc’[m]s[mA'c  +  D]}  (20) 


We  now  proceed  to  derive  expressions  for  Rss[rc]  and 
Rbs[m],  Using  Eq.  (13), 


Rss[u]  = 


R R^W 


(21) 


Figure  2.  Bound  on  the  potentially  non-zero  off- 
diagonal  elements  of  f?ss[n]  [Aic  =  64]. 


With 
note  that 


fixed  m  and  n, 
cifc)  tm],  •  •  •  i  cNc  [m]j  and 


are  two  different  rows  of  the  Hadamard  matrix.  The  element- 
by-element  (Schur)  product  of  these  two  rows  is  also  a  row 
of  the  Hadamard  matrix  containing  (Nc/2)  l’s  and  ( Nc/2) 
-l’s.  So 


Nu  =  1, . . . ,  Nc/ 2 
Nu  =  Nc/2  +  l,...,Nc 


(25) 
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Therefore,  when  i  ^  j  and  (n  +  1  —  i)modNc  =  (n  +  1  — 

j)modJVc, 


|5<f)N|  <  |  J 


1  Mk)  =  1, . . . ,  Nc/2 

Nc/Nik)  -  1  Nik)  =  Nc/2  +  1, . . . ,  Nc 


This  bound  is  plotted  as  a  function  of  in  Fig.  2. 

Note  that  when  =  Nc,  Sjkk^  [n]  =  0  for  all  i  ^  j ,  so 
R<kk)  [n]  =  I.  If  we  assume  that  the  Walsh  codes  are 
chosen  randomly  when  N ^  <  Nc,  it  can  be  shown  that 
Kk)i  n\/Nlk)  is  a  linear  function  of  a  hypergeometric  ran¬ 
dom  variable.  Its  variance  is  (Nc  —  ) /(Nc  —  1). 

Therefore,  those  off-diagonal  elements  which  are  not  zero 
have  zero  mean  and  the  variance  shown  in  the  plot  in  Fig. 
2.  For  nearly  all  values  of  Nik\  the  variance  is  clearly  quite 
small.  So  in  all  cases,  we  may  well  approximate  [n]  by 

I  in  Eq.  (18)  yielding 

g(fc)[m]  =  (WH*  +  (27) 

We  will  see  through  simulation  that  this  approximation 
works  quite  well  when  compared  to  the  “exact”  equalizer 
constructed  with  a  time-varying  R ss- 

The  ith  element  of  Ris[m]  is  (with  n  =  mNc  +  D): 

f  c[kJ[n  +  1  —  *lc^[Z>  +  1  —  t],  for 

'’[mjsjn+l-i]}  =  <  0<l5  +  l-  i<Nc-l 

[  0  otherwise 

(28) 

With  D  satisfying  Nc  —  1  <  D  <  L+Ng  —  2,  the  entire  Walsh 
code  for  the  desired  user  appears  in  Ri,s[m]  and 

Rss[m]  =  [  0c4-i_jvc  cy[m]  0l+n9-2-D  ]T  (29) 


Cj[m]  =  [c^[(m  +  l)Ne  -  l]c(k) [Nc  —  1] . 

c[kJ[mNc  +  l]c<*°  [1],  [mAfc]c{*)  [0]]T 

While  the  equahzer  varies  from  symbol  to  symbol  due 
to  variation  in  both  RssM  and  R(>s[to],  by  approximating 
RssM  by  I,  the  variation  is  confined  to  Rbs[m]. 

5.  RAKE  RECEIVER 

The  RAKE  receiver  is  simply  a  multipath-incorporating 
matched  filter.  In  particular,  the  RAKE  can  be  viewed  as  a 
chip-spaced  filter  matched  to  the  channel,  followed  by  cor¬ 
relation  with  the  long  code  times  channel  code.  Note,  in 
practice,  these  operations  are  normally  reversed,  but  may 
be  reversed  due  to  short-time  LTI  assumptions.  The  RAKE 
receiver  is  exactly  represented  by  the  “Chip-Level”  portion 

of  Figure  1,  if  we  let  Ng  =  L  and  fl^M  =  h^[L  —  n],  n  = 
0, . . . ,  L  -  1,  i  =  1, . . . ,  M. 


Walsh-Hadamard  sequence.  The  signals  for  all  the  users  are 
of  equal  power  and  summed  synchronously,  and  each  base- 
station  had  the  same  number  of  users.  The  sum  signal  is 
scrambled  with  a  multiplicative  QPSK  spreading  sequence 
(“scrambling  code”)  of  length  32768  similar  to  the  IS-95  stan¬ 
dard. 

The  uncoded  BER  results  are  averaged  over  different  chan¬ 
nels  for  varying  SNRs.  The  channels  were  generated  accord¬ 
ing  to  the  model  presented  in  Section  2.  “SNR”  is  defined 
to  be  the  ratio  of  the  sum  of  the  average  powers  of  the  re¬ 
ceived  signals  from  the  desired  base-station,  to  the  average 
noise  power,  after  chip-matched  filtering.  “SNR  per  user 
per  symbol”  is  the  SNR  divided  by  the  number  of  users  and 
multiplied  by  the  spreading  factor.  For  the  chip-level  MMSE, 
the  total  delay  of  the  signal,  D,  through  both  channel  and 
equalizer,  was  chosen  to  minimize  the  MSE  of  the  equalizer. 

We  first  present  results  for  a  receiver  near  the  base-station 
so  that  out-of-cell  interference  is  negligible.  Two  receive 
antennas  are  employed  with  no  oversampling.  Two  equal¬ 
izer  lengths  were  simulated:  for  chip-level,  Ng  =  57  and 
114,  while  for  symbol-level,  the  length  is  chosen  Nc  —  1 
longer.  Since  the  chip-level  equalizer  is  followed  by  corre¬ 
lation  with  the  channel  code  times  long  code,  its  effective 
length  is  Ng  +  Nc  —  1;  hence,  a  fair  comparison  between 
the  symbol-level  and  chip-level  sets  the  symbol-level  equal¬ 
izer  longer  by  Nc  —  1  chips.  Figure  3  presents  the  results 
for  the  fully  loaded  cell  case,  i.e.  64  equal  power  users  were 
simulated.  The  RAKE  receiver  is  significantly  degraded  at 
high  SNR  by  the  MAI,  which  is  seen  in  the  Figure  as  a  BER 
floor  for  SNR  greater  than  10  dB.  The  chip-  and  symbol-level 
equalizers  perform  much  better  than  the  RAKE.  Increasing 
the  equalizer  length  improves  performance  for  both  chip-level 
and  symbol-level.  Comparing  the  length  57  chip-level  to  120 
symbol-level,  we  observe  little  improvement  in  the  symbol 
level  at  low  SNR  with  increasing  improvement,  up  to  2-3 
dB,  at  high  SNR.  Comparing  length  114  chip-level  to  177 
symbol-level  also  shows  an  improvement  that  increases  with 
SNR,  but  less  of  an  improvement  than  for  the  shorter  equal¬ 
izers.  Note  that  since  all  64  channel  codes  are  present  and 
have  equal  power,  R,,  =  I  and  the  symbol-level  MMSE  es¬ 
timate  is  optimal  in  the  MSE  sense. 

In  Figure  4,  once  again  the  out-of-cell  interference  is  as¬ 
sumed  negligible.  In  this  simulation  only  8  equal  power  chan¬ 
nel  codes  are  active,  i.e.,  the  cell  is  only  lightly  to  moderately 
loaded.  In  this  simulation  the  RAKE  receiver  does  much 
better  since  it  experiences  less  in-cell  MAI  than  for  64  users. 
For  the  range  of  SNR  simulated  the  chip-level  equalizer  does 
only  slightly  better  than  the  RAKE  receiver.  As  for  the 
fully  loaded  cell,  the  symbol-level  equahzer  performs  better 
than  the  chip-level  equalizer.  For  comparison  the  “optimal” 
symbol-level  equahzer  is  shown  which  involves  a  matrix  in¬ 
verse  for  every  symbol  (as  in  Equation  (18));  this  equahzer  is 
only  slightly  better  than  the  symbol-level  equahzer  presented 
in  this  paper.  This  result  justifies  the  assumption  /  simplifi¬ 
cation  that  R„  is  proportional  to  I,  even  when  Nu  <  Nc. 

Figure  5  results  from  a  simulation  with  two  base-stations, 
each  with  64  equal  power  users.  The  2nd  base-station  is 
treated  as  interference  and  is  received  with  the  same  power 
as  the  1st,  desired  user’s  base-station.  Specifically, 


6.  SIMULATION  RESULTS 

A  wideband  CDMA  forward  hnk  was  simulated  similar  to 
one  of  the  options  in  the  US  cdma2000  proposal  [1],  The 
spreading  factor  is  Nc  —  64  chips  per  bit.  Simulations 
were  performed  for  both  “saturated  cells,”  that  is,  all  64 
possible  channel  codes  active,  as  well  as  lightly  loaded  cells 
with  8  channel  codes  active.  The  chip  rate  is  3.6864  MHz 
(Tc  =  0.27 ps),  3  times  that  of  IS-95.  The  data  symbols 
are  BPSK  which,  for  each  user,  are  spread  with  a  length  64 


Af  M 

£  E{\y(m  Ml2}  =  £  E{\y£>  Ml2}.  (31) 

m=l  m=  1 

In  addition  to  two  independent  antennas,  two-times  over- 
sampling  is  employed  for  a  total  of  four  chip-spaced  channels. 
The  results  are  very  analogous  to  the  single  base-station  case: 
the  symbol-level  out-performs  the  chip-level,  increasingly  so 
at  high  SNR.  However  the  improvement  is  more  dramatic, 
especially  for  the  shorter  lengths. 
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7.  CONCLUSIONS 


1  base  station:  2  antennas 


The  symbol-level  equalizer  derived  here  performs  better  than 
the  chip-level,  however  at  a  greater  computational  cost.  In 
fact  our  simulations  have  shown  that  even  though  the  equal¬ 
izer  is  sub-optimal,  it  has  performance  closely  approaching 
optimality.  The  approximation  that  the  source  covariance 
is  diagonal  means  that  a  matrix  inverse  is  required  only  as 
often  as  the  channel  changes  (and  not  every  symbol),  and 
hence  the  computational  complexity  is  much  smaller  than 
the  optimal  equalizer. 
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Figure  3.  Fully  loaded  cell,  all  64  channel  codes  in 
use. 


Figure  4.  Lightly  loaded  cell,  8  out  of  64  active  chan¬ 
nel  codes. 


Figure  5.  One  interfering  base-station  of  equal 
power,  64  channel  codes  per  cell. 
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ABSTRACT 

In  this  paper,  we  propose  transform  domain  array  process¬ 
ing  schemes  for  DS-CDMA  communications.  Space-time 
adaptive  processing  (STAP)  is  a  useful  means  to  combat 
the  multiuser  interference  (MUI)  in  CDMA  systems.  The 
computation  burden  and  slow  convergence  are  two  major 
problems  in  implementing  the  STAP.  This  paper  proposes 
optimum  and  sub-optimum  transform  domain  arrays  with 
different  feedback  schemes  for  CDMA  communications.  The 
transform  domain  arrays  provide  reduced  computations  over 
traditional  implementation  methods  as  well  as  they  offer 
improved  convergence  performance,  leading  to  an  efficient 
system  implementation. 

1.  INTRODUCTION 

Array  processing  in  direct-sequence  code  division  multiple 
access  (DS-CDMA)  communications  has  recently  attracted 
considerable  attention  [1,  2,  3].  The  use  of  the  joint  space- 
time  adaptive  processing  (STAP),  which  includes  two-di¬ 
mensional  RAKE  (2-D  RAKE)  receiver,  provides  excellent 
performance  of  suppressing  the  multiuser  interference  (MUI) 
and  inter-symbol  interference  (ISI)  as  well  as  combining  the 
multipath  signals  to  achieve  the  RAKE  diversity  effect  in 
frequency-selective  fading.  In  order  to  combine  sufficient 
number  of  multipath  rays  to  enhance  the  signal  power  and 
reduce  the  ISI,  a  large  number  of  weights  are  required  at 
the  feedback  loop.  The  complexity  and  convergence  rate 
problems  remain  the  bottleneck  of  the  implementation  of 
these  systems  [4]. 

In  this  paper,  we  propose  a  transform  domain  app¬ 
roach  to  chip-level  space-time  adaptive  processing  for  DS- 
CDMA  communications  with  different  feedback  schemes. 
Chip-level  space-time  adaptive  processing  effectively  miti¬ 
gates  both  MUI  and  ISI  before  despreading  and,  as  such, 
only  a  simple  correlation  and  summation  operation  with  the 
desired  user’s  code  is  required  to  follow.  When  subband 
array  is  applied  to  the  chip-rate  STAP  processing,  the  sig¬ 
nal  decorrelation  using  orthogonal  transforms  and  feedback 
schemes  greatly  reduce  the  circuit  size  within  each  single 
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feedback  loop,  and  subsequently  improves  the  receiver  con¬ 
vergence  performance  [5,  6].  Discrete  Fourier  Transform 
(DFT),  filter  banks  and  wavelets  are  among  the  commonly 
used  orthogonal  transform  for  this  purpose  [7].  In  this 
paper,  we  consider  the  DFT  as  the  example.  Decimation 
available  at  the  transform  domain  processing  also  makes  it 
possible  to  reduce  the  signal  processing  speed  at  each  trans¬ 
form  domain  bin  [5,  6]. 

2.  SPACE-TIME  ADAPTIVE  PROCESSING 
FOR  CDMA 

We  consider  a  base  station  using  an  antenna  array  of  N  sen¬ 
sors  with  P  users.  In  CDMA  systems,  usually  P  >  N.  The 
received  signal  vector  at  the  array  is  expressed,  in  discrete¬ 
time  form  sampled  at  the  chip  rate,  as 

P  oo 

*(*)  =  EE  dPmP(k  -i)+ b  (k)  (i) 

p=  1  l—  —  oo 

where  dp(k)  and  hp(fc)  are  the  chip-rate  sequence  and  the 
channel  response  vector  of  the  pth  user,  and  b (k)  is  the 
additive  noise  vector. 

In  CDMA  communications,  each  symbol  is  spread  into 
L  chips.  Without  loss  of  generality,  we  denote  the  signal 
of  the  user  of  interest  as  si(n),  and  the  signals  from  other 
users  as  sp(n),  p  =  2, ...,  P.  Aperiodic  spreading  sequence 
are  assumed.  The  chip  length  is  L  =  T/Tc,  where  T  and 
Tc  are,  respectively,  the  symbol  duration  and  chip  duration. 
We  denote  the  spreading  sequence  for  the  nth  symbol  of  the 
P  users  as  cp(n,  l),  p  =  1, ...,  P,  l  =  1, ...,  L.  Then, 

dp(k)  =  sp(n)cp{n,  l  -  lp)  (2) 

where  k  =  nL  +  l,  and  lp{ 0  <  lp  <  L)  is  the  chip  delay 
index  that  models  the  asynchronous  system.  We  make  the 
following  assumptions: 

Al)  The  information  symbols  sv{n),p  =  1,2,  ...,P,  are 
wide-sense  stationary  and  i.  i.  d.  with  £,[.sp(n)s*(n)]  =  1. 

A2)  The  spreading  sequences  cp(n,l),p  =  1,  2, ...,  P,  l  = 
1,  •  •  • ,  L,  are  assumed  independent  random  sequences. 
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and 


A3)  All  channels  h p{k),p  —  1,2,  are  linear  time- 
invariant,  and  of  a  finite  duration  within  [0,  DTC],  That  is, 
hp(fc)  =  0,p  =  1,  2, P,  for  k  >  D  and  k  <  0. 

A4)  The  noise  vector  b (k)  is  zero-mean,  temporally  and 
spatially  white  with 

U[b(k)bT(fc  -(-  i)]  =  0  for  any  / 

and 

£[b(fc)bH(fc-M)]  =  aINS(l), 

where  the  superscripts  T  and  H  denote  transpose  and  con¬ 
jugate  transpose,  respectively,  I  at  is  the  N  x  N  identity 
matrix,  and  5(1)  is  the  Kronecker  deta  function. 

By  stacking  M  consecutive  chips  of  x(fc),  we  can  obtain 

p 

x(fc)  =  ]T  Wpdp(fc)  +  b(fe)  =  Hd(k)  +  b(fc),  (3) 

p=i 

where 

x(k)  =  [xT(fc)  xT (k  —  1)  •  •  •  xT(fc  —  M  +  1)] T  ,  (4) 

d p(k)  =  [dp(k)  dp(k  -  1)  •  •  •  dp(k  -  M  +  l)f ,  (5) 


d(fc)=  [£(k)dUk)  •••  <£(fc)]T,  (6) 

'Ll  — 

rhp(o)  ...  h (dp)  o  .  o-i 

0  hp(0)  h p(Dp)  0  0 


L  0  .  0  hp(0)  ...  hp(Z)p)J 

(7) 

H  =  [Hi  V.2  ■■■  Up]T  ,  (8) 

and 


b(fc)  =  [bT(fe)  b T(k  -  1)  •  ■  •  bT(k  -  M  +  1)] T  .  (9) 

Denote  w  as  the  weight  vectors  of  the  STAP  system 
corresponding  to  x(fc),  the  output  of  the  STAP  becomes 


y(k)  =  wT  x(k).  (10) 

The  optimum  weight  vector  under  the  minimum  mean  square 
error  (MMSE)  criterion 

min E  \y(k)  —  di(k  -  v)\2  (11) 

W 

is  given  by  the  Wiener-Hopf  solution 

w  opt  =  R_1r,  (12) 

where  v  >  0  is  a  delay  to  minimize  the  MMSE, 

R  =  E[x*(fc)xT(fc)],  (13) 

r  =  E[x*(k)di(k  —  n)],  (14) 

and  the  superscript  *  denotes  complex  conjugate.  The 


training  signal  is  assumed  to  be  an  ideal  replica  of  di(k). 

Prom  the  assumptions  Al)  -  A4),  (13)  and  (14)  can  be 
expressed  as 

R  =  %*'Ht  +  <tImat,  (15) 


r  =  'Hie,,,  (16) 

respectively,  where 

ev  =  [0_— _0  1  0  •  Of.  (17) 

V 

The  MMSE  is  given  by 

MMSE  =  E  |wjptx(fc)  -di(k-  v)\2  =  1  -  rHR-1r.  (18) 

Despreading  the  array  output  signal  y (k)  by  the  sig¬ 
nature  code  of  desired  signal,  we  obtain  the  symbol-rate 
output  signal  for  detection,  expressed  as 

L- 1 

z(n)  =  '^y(nL  +  l  +  v)ci(n,l).  (19) 

1=0 

3.  TRANSFORM  DOMAIN  ARRAYS  WITH 
DIFFERENT  FEEDBACK  SCHEMES 

3.1.  Centralized  Feedback  Scheme 

Performing  a  transform  of  x(n)  by  using  an  orthogonal 
matrix  T,  we  obtain  the  received  signal  vector  at  the  trans¬ 
form  domain  as 

xt(  n)  =  Tx(n)  (20) 

with 

xr(fc)  =  [(xyl)(fc))T  (xf(fc)f  •••  (x<.M)(fc)f]T,  (21) 

where  xf  }(n)  is  the  signal  vector  at  the  mth  transform 

domain  bin.  Denote  wr  =  (wfr  (w®)T  (wf^fj 

as  the  weight  vector  in  the  transform  domain.  Then  the 
output  of  the  transform  domain  array  system  becomes 


yr(k)  =  wjxr  (k)  =  w?Tx(A;).  (22) 

Again,  using  the  MMSE  criterion 

min  E  \yr(k)  —  di(k  —  v)\2  ,  (23) 

the  optimum  weight  vector  is  given  by 

w  r.opt  =  R^rr  =  T*wopt,  (24) 

where 

Rt  =  F[xH&)xr(fc)] 

=  T*RTt  (25) 

=  (T UY  (Tnf +ct1mn, 

rT  =  E[x*T(k)di(n  -  »)]  =  T*r  =  (TWi)*  e„.  (26) 


It  is  easy  to  verify  that  the  transform  domain  array  with 
centralized  feedback  scheme  provides  the  same  steady-state 
MMSE  performance,  as  given  by  equation  (18).  The  cen¬ 
tralized  feedback  scheme  is  depicted  in  Fig.  1. 
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Reference  signal 


we  ignore  the  off-block-diagonal  elements  of  the  correla¬ 
tion  matrix  Rr ,  yielding  an  approximation  by  the  block- 
diagonal  matrix 


0  Rl, 


0  0  RV 


Fig.  1  Subband  array  with  centralized  feedback. 


3.2.  Localized  Feedback  Scheme 

We  note  that  the  orthogonal  transform  can  reduce  the  corre¬ 
lation  between  different  transform  bins.  DFT,  filter  banks, 
and  wavelets  are  commonly  used  methods  for  providing 
orthogonal  transforms.  Here  we  consider  the  DFT  as  the 
example.  Denote 


R|m)  =S[(x^m)(n))*(x^(n))r] 

V  '  (33) 

=  (T(m)ny  (T(m)W)T  +<xLv 

is  the  signal  covariance  matrix  of  x^(n).  Using  the  prop¬ 
erty  of  block-diagonal  matrix,  we  have 


(R'r)"1  = 


0  (Rt  )' 


Therefore,  the  inversion  computation  of  dimension  NM  x 
NM  becomes  M  parallel  group  of  matrix  inversion  of  dimen¬ 
sion  NxN,  as  such  the  computations  can  be  greatly  reduced. 


.. 

>o>o 

W°M 

wit 

W°M  ■ 

wit  ■ 

::  i ' 

VV  M 

(27) 

When  recursive  methods  are  used,  it  is  realized  by  using  M 
parallel  control  loops  with  N  weights  in  each  loop.  The 
localized  feedback  scheme  is  shown  in  Fig.  2. 

W°M 

wT~l)  • 

..  w^\ 

Reference  signal 

as  the  M  x  M  transform  matrix  at  the  output  of  each  array 
sensor,  where 

WM  =  exp  (~|p)  ,  (28) 

then  the  transform  matrix  T  becomes 

T  =  P2(Lv®T„)P1,  (29) 

where  ®  denotes  the  Kronecker  product.  In  (29),  Pi  is  a 
permutation  matrix  to  change  the  order  of  the  vector  x(n) 
such  that  the  M  samples  at  each  array  sensor  align  together, 
and  P2  is  another  permutation  matrix  that  allows  the  N 
data  of  each  bin  to  align  together. 

T  can  be  expressed  in  the  form 


where  T(m*  is  the  N  x  NM  submatrix  of  the  matrix  T 
corresponding  to  the  mth  bin.  Denote 

x(m)(n)  =T(m)x(n)  (31) 

as  the  signal  vector  at  the  mth  subband.  When  the  sig¬ 
nal  correlation  between  different  transform  bins  is  small, 


Subband  array  with  localozed  feedback. 


We  use  d\  ( k )  as  the  reference  signal  at  each  transform 
bin.  In  this  case,  the  cross-correlation  vector  between  the 
received  signal  vector  and  the  reference  signal  at  the  mth 
transform  bin  becomes 


E  (4-)(fc))* *(*-.)]  =  [T( 
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In  the  localized  feedback  scheme,  the  weight  vector  at 
each  bin  can  be  obtained  from  the  NxN  correlation  matrix 


Ry" ;  and  the  N  x  1  correlation  vector  r^’1 '  which  are  deter¬ 
mined  only  by  the  data  vector  and  reference  signal  at  that 
bin,  i.e., 


t(m) 
W  T 


(r 


(36) 


Therefore,  the  centralized  feedback  transform  domain  array 
can  be  approximated  by  a  set  of  parallel  independent  rank- 
reduced  adaptive  array  processors  at  each  bin,  at  the  cost 
of  ignoring  the  correlation  between  signals  at  different  bins. 
Such  transform  domain  array  with  the  localized  feedback 
scheme  can  be  easily  implemented  by  using  a  set  of  parallel 
array  processors,  each  with  the  number  of  weights  equal  to 
N,  instead  of  NM. 

It  is  clear  that 


/ 

rT 


(r‘'>)T  (r?>)T  ...  (r<"’)T]T  =  rT.  (37) 


Therefore,  the  equivalent  full-band  weight  vector  of  the 
localized  feedback  transform  domain  array  becomes 


Reference  signal 


w  'T  =  (R't)  1  r'T  =  (RT)-1rT.  (38) 


Fig.  3  Subband  array  with  partial  feedback. 


The  corresponding  MSE  of  the  localized  feedback  scheme  is 
given  by 

MSEz,f  =  1  +rtf{R'T)-1RT(RlT)-1rT 
— 2Re  [rF(R'T)-1rT]  ■ 

Equation  (39)  implies  that  the  localized  feedback  transform 
domain  array  approach  is  suboptimal,  and,  its  performance 
depends  on  the  significance  of  the  cross-correlation  between 
signals  at  different  bins.  It  is  clear  from  (25)  and  (39)  that 
the  off-block-diagonal  element  of  matrix  Rt,  and  subse¬ 
quently  the  MSE  performance  of  the  localized  feedback  sub¬ 
band  array,  depend  on  both  the  transform  matrix  T  and  the 
channels  Hp,p  =  1, 2, ...,  P. 

3.3.  Partial  Feedback  Scheme 

In  the  previous  subsection,  we  discussed  the  transform  do¬ 
main  array  with  localized  feedback  scheme  as  an  approxi¬ 
mation  of  the  transform  domain  array  with  centralized  feed¬ 
back  scheme.  Such  localized  feedback  scheme  reduces  the 
number  of  weights  at  each  bin  at  the  expense  of  perfor¬ 
mance  reduction,  since  the  off-block-diagonal  elements  are 
not  considered  in  the  weight  estimation. 

A  subband  array  with  partial  feedback,  which  is  shown 
in  Fig.  3,  is  also  possible  and  provides  more  flexibility  in 
trading-off  the  system  complexity  with  the  steady-state  MSE 
performance.  As  shown  below,  the  partial  feedback  scheme 
is  a  generalization  of  the  centralized  and  localized  feedback 
schemes,  which  can  be  considered  as  two  extreme  and  spe¬ 
cial  cases. 

In  the  transform  domain  array  with  partial  feedback 
scheme,  the  total  M  bins  are  divided  into  K  groups.  The 
number  of  bins  in  ith  group  is  Mi,  i  =  1, 2, ...,  K,  with  Mi  + 

M2  H - 1-  Mk  =  M.  In  this  paper,  we  consider  the  simple 

case  of  Mi  =  M2  =  •  •  ■  =  Mk  =  M/K. 


In  this  case,  the  signal  covariance  matrix  Rt  is  approxi¬ 
mated  by  a  new  block-diagonal  matrix  R't  with  larger  block 
size  M\N ,  expressed  as 


Rt  = 


R! 


(Gl) 


n(G2) 

Jtvji 


r: 


0 

0 

(GK) 


(40) 


where  R^?^  is  of  dimension  M\N  x  M\N.  For  Mi  >  1, 
fewer  off-block-diagonal  elements  are  ignored  in  R't  com¬ 
pared  to  Rt-  Therefore,  the  partial  feedback  scheme  pro¬ 
vides  more  accurate  weights  estimation,  and  subsequently 
better  MSE  results,  as  compared  with  the  localized  feed¬ 
back  scheme.  Similar  to  the  localized  feedback  case,  the 
weight  vector  in  the  partial  feedback  scheme  is  given  by 


//  rr*n  \  —  1  ft 
wT  =  (Rt)  vt  = 


■  (R'?l))-14Gl)  ■ 
(R§?a))-14°2) 

.(r^Gk))-i4Gk). 


(41) 


where 

=E  [(x<.Gi)(fc))*di(fc-v) 


(42) 


as  d\(k  —  v)  is  used  as  the  reference  signal  at  each  group, 
and 


x^‘\k)  = 


(x«<-1)Ml+1)(fc))5 
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Since 

=[(r<?")’'  (,?>')T  ...  (r<f«))T]r  =  ,T, 

(44) 

the  MSE  of  the  partial  feedback  array  is  therefore 
MSEpf  =  1 +r^(R^)-1RT(RT)_1rT 

— 2Re[i#(R£)-1rT]  .  ^ 

It  is  noted  that,  the  partial  feedback  scheme  simplifies 
to  the  centralized  feedback  scheme  when  M\  =  M.  In  this 
case,  Rp  becomes  Rr,  and  equation  (45)  becomes  equation 
(18).  On  the  other  hand,  the  localized  feedback  scheme  is 
achieved  by  setting  Mi  =  1.  In  this  case,  Rp  becomes  H'T, 
and  equation  (45)  becomes  equation  (39). 

4.  CONVERGENCE  PERFORMANCE 

In  this  section,  we  consider  the  convergence  performance  of 
the  transform  domain  arrays  with  centralized  feedback  and 
localized  feedback.  The  popularly  used  least  mean  square 
(LMS)  algorithm  is  considered. 

One  of  the  key  factors  affecting  the  convergence  perfor¬ 
mance  in  the  proposed  transform  domain  arrays  is  the  num¬ 
ber  of  controllable  weights  in  the  feedback  system.  In  the 
transform  domain  array  with  centralized  feedback  scheme, 
the  number  of  weights  is  NM,  whereas  in  the  cases  of  the 
transform  domain  array  with  localized  feedback  and  partial 
feedback  schemes,  the  number  of  weights  in  each  indepen¬ 
dent  control  loop  is  N  and  Mi  N,  respectively  (although  the 
number  of  total  weights  of  the  entire  bins  remains  NM). 

It  is  known  that  the  convergence  rate  of  LMS  algorithm 
depends  on  the  eigenvalue  spread,  i.e.,  the  ratio  between 
the  maximum  and  minimum  eigenvalues  of  the  covariance 
matrix  [8].  Since  the  covariance  matrix  defined  at  a  bin, 
R^n),  m  =  1, ...,  M,  or  that  defined  at  several  bins,  R j?*\i  = 
is  a  submatrix  of  Rt,  from  the  interlacing  prop¬ 
erty  [9],  the  eigenvalue  spread  of  R^  and  that  of  R^?' '  are 
smaller  than  that  of  Rt-  Therefore,  the  transform  domain 
arrays  with  localized  and  partial  feedback  provide  improved 
convergence  performance. 

On  the  other  hand,  when  comparing  the  STAP  system 
and  the  transform  domain  array  with  centralized  feedback 
scheme,  since  an  orthonormal  transform  does  not  change 
the  eigenvalues,  it  is  clear  that  the  eigenvalue  spread  of  R 
and  Rt  are  the  same.  Therefore,  the  STAP  system  and  the 
centralized  feedback  transform  domain  array  offer  the  same 
convergence  performance  [6].  However,  if  the  signal  powers 
at  different  bins  are  different  (due  to,  e.g.,  pulse  shaping  fil¬ 
tering,  frequency-selective  channel  characteristics),  the  con¬ 
vergence  performance  can  be  improved  by  performing  power 
compensation  at  the  different  bins  so  that  the  eigenvalue 
spread  is  reduced  [10,  11,  12]. 

5.  CONCLUSION 

We  have  analyzed  the  performance  of  transform  domain 
arrays  for  DS-CDMA  systems  with  different  types  of  feed¬ 
back  schemes,  and  derived  the  respective  expressions  of 
the  mean  square  error  (MSE).  For  all  proposed  schemes, 


the  transformation  is  performed  in  the  chip  level  before 
despreading.  It  has  been  shown  that  transform  domain 
arrays  with  localized  and  partial  feedback  schemes  are  gen¬ 
erally  suboptimal,  and  their  MSE  performance  depends  on 
the  transform  matrix  of  the  analysis  filters  as  well  as  the 
communication  channel  characteristics.  Since  the  local¬ 
ized  feedback  scheme  reduces  the  number  of  weights  at 
the  control  loop,  the  convergence  rate  is  usually  improved, 
which  is  of  practical  importance  in  implementing  space- 
time  adaptive  processing  in  feist  fading  environments.  The 
partial  feedback  scheme  generalizes  the  other  two  proposed 
schemes,  namely,  the  centralized  and  localized  feedback  sys¬ 
tems.  This  scheme  provides  the  flexibility  to  balance  the 
system  complexity  with  the  steady-state  and  convergence 
performance. 
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ABSTRACT 

Space-time  adaptive  processing  (STAP)  is  an  effective 
technique  of  suppressing  both  the  multiuser  access 
interference  (MUAI)  and  the  inter-symbol  interference  (ISI) 
in  wideband  CDMA  mobile  communication  systems. 
However,  its  complexity  is  one  of  the  key  problems  in 
practical  implementations.  In  this  paper  we  propose 
adaptive  antenna  techniques  that  realize  low-complexity 
space-time  adaptive  processing  within  a  given  spatial  sector 
by  spatial-smoothing  subarray  beamforming  sectorization. 
The  proposed  technique  has  the  close  performance  to  that  of 
the  associated  optimum  element-space  STAP  system. 

I.  INTRODUCTION 

In  direct-sequence  code-division  multiple-access  (DS- 
CDMA)  systems,  adaptive  antennas  under  the  scheme  of 
space-time  adaptive  processing  (STAP)  [1,  2]  is  called  as 
two-dimensional  RAKE  (2-D  RAKE)  receivers  [3],  and  is 
known  to  be  an  effective  method  in  suppressing  both  the 
multiuser  access  interference  (MUAI)  and  the  inter-symbol 
interference  (ISI).  However,  the  prohibitive  computation 
complexity  of  STAP  systems  is  one  of  the  key  problems  in 
the  practical  implementations  which  restricts  their 
application  to  practical  systems  and  To  reduce  their 
complexity,  optimal  and  sub-optimal  approaches  based  on 
parallel  implementation  and  low-rank  transformations  have 
been  proposed  so  far  [4-8]. 

Beamspace-based  partially  adaptive  processing  methods 
are  the  sub-optimal  approaches  widely  used  in  array  signal 
processing,  where  reduced-dimension  processing  is 
performed  via  employing  a  few  beams  to  encompass  the 
significant  components  in  the  systems  [4, 9].  The  sectorized 
beamspace  adaptive  diversity  combiner  is  one  of  the 
applications  which  is  effective  in  combating  multipath 
fading  in  the  wireless  communications  [4],  References  [5] 
and  [6]  proposed  other  two  approaches  that  involve  the 


wideband  beamforming  and  the  reduced-dimension 
beamforming,  respectively. 

In  this  paper  we  propose  novel  low-complexity  sectorized 
adaptive  antenna  techniques  which  use  the  spatial- 
smoothing  subarray  beamformers  to  achieve  effective  beam 
diversity  as  well  as  sufficient  degrees  of  freedom  (DOF’s) 
for  MUAI  suppression.  In  the  proposed  techniques,  the  full 
field  of  view  is  divided  into  a  number  of  spatial  sectors, 
wherein  the  sectorized  STAP  is  performed  individually.  The 
array  is  partitioned  into  a  set  of  subarrays,  each  forms  a 
beam  to  cover  the  same  specific  sector  of  interest.  In  the 
sector  of  interest,  the  number  of  MUAI’s  is  greatly  reduced 
from  the  full  field-of-view  condition.  The  sectorized  STAP 
scheme  combines  the  advantages  of  the  reduced-rank 
beamspace  processing  and  the  spatio-temporal  processing 
techniques.  In  comparison  with  the  conventional  STAP 
systems  performed  in  the  full  field  of  view,  the  complexity 
of  the  sectorized  processing  is  highly  reduced  whereas  the 
performance  loss  to  that  of  the  optimum  STAP  systems  can 
be  kept  small. 

II.  ARRAY  SIGNAL  MODEL 

Consider  a  cellular  CDMA  base  station  using  an  antenna 
array  of  N  (N>  1)  elements  with  P  users.  The  p-th  user’s 
baseband  waveform  of  the  transmitted  signal  is  expressed  as 

sp(t)=  X sp(m)pp(t-mT ),  (1) 

m=-<> o 

where  sp  ( m )  denotes  the  w-th  information  symbol  of  the 
p-th  user, 

Pp(t)=  £ cp(j)y(t  -jTc),0<t<T  (2) 

7=0 

represents  the  signature  waveform  of  the  p-th  user, 
[c p{j))NjL()X  is  the  spreading  code  assigned  to  the  p-th  user, 
Nc  is  the  number  of  chips  per  symbol,  y/(t)  is  the 
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normalized  chip  waveform  limited  within  [0,rc],  and  Tc  is 
the  chip  interval.  The  spreading  sequence  can  be  periodic  or 
aperiodic,  which  depends  on  the  standard  to  be  used.  In  this 
paper,  we  consider  the  periodic  case,  i.e.,  the  non-random 
CDMA  systems. 

The  array  receiving  signal  vector  \{t)  is  denoted  as 
p  lp 

X(0  =  11  a (0  p  )^sp  ( t -T,p)  +  n (/) 

p=W=l  (3) 

=  X  'LIp(fn)Sp(f-mT)  +  n(t) 

p=\  m=-°° 

where 

Lp 

gp(O  =  Xa(0p)£ppp(f-Tp),  (4) 

;=i 

{0P,TP,^P }  express  respectively  the  angle-of-arrival 

(AOA),  the  time  delay,  and  the  propagation  loss 
corresponding  to  the  /-th  path  of  the  p- th  user.  Moreover, 
a (0)  is  the  array  steering  vector  corresponding  to  0; 
sp  (jn)  denotes  m- th  information  symbol  of  the  p-th  user, 
Lp  is  the  total  number  of  multipath  rays  of  the  p-th  user, 
T  =  NCTC  is  the  symbol  duration,  and  n(r)  is  the  array 
noise  vector. 

Define 

Lp 

hp(O  =  Sa(0/%V(#-T,p)  (5) 

/=i 

as  the  channel  response  of  the  p-th  user,  we  can  rewrite  (3) 
as 

x(0  =  X  X  % sp{m)cp(jJbp(t-jTc  -mT)+n(,t).(6) 

p-\  w=-«>  7=0 

We  make  the  following  assumptions: 

Al)  The  information  symbols  s p  (/?;),  p  =  1,  -  ,P  are 

i.i.d.,  and  satisfy  E{s p(m)s*(n)\  =  8pq8mn ,  where  (•)* 
denotes  complex  conjugation  and  8pq  denotes  the 
Kronecker  delta  function. 

A2)  The  channels  |hp(r),p  =  are  linear  and 

time-invariant  with  a  finite  duration  within  [0,  DpTc  ] .  Here, 
we  assume  DpTc  >T  for  wideband  CDMA  channels. 

A3)  The  noise  vector  is  zero-mean,  temporally  and 
spatially  white  with  £{n(r)nr(r)}  =  0  and 
£{n(r)nw  (r))  =  cr2I ,  where  (-)T  and  (-)H  denote 

transpose  and  conjugate  transpose,  respectively,  c2 
expresses  the  noise  power,  and  I  is  the  identity  matrix.  The 
noise  vector  is  also  assumed  to  be  uncorrelated  with  the  user 


signals. 

Denote  A=  TJJ  as  the  sampling  cycle,  where  /  >  1  is  an 
integer  which  expresses  the  factor  of  oversampling.  Thus, 
sampling  at  t  =  /A  +  tiTc ,  the  discrete  form  of  (5)  becomes 

P  4~  Nr- 1 

x(/A  +  nTl.)  =  '£j  ]T  laS p{m)c p{j)x 

p=\m=-°°  y'=0 

h  p  (;A  +  tiTc  -  jTc  -  mT)  +  n(/A  +  nTc ) 

/=0,...^-I.  (7) 

By  stacking  x(iA+nTc),  i=0, ...7-1,  we  have 
p 

x(n)  =  '£'ZPp(n-d)hp(d)  +  n(n) ,  (8) 

p=ld=0 

where  pp  (n)  is  the  chip-rate  signal  sequence  of  the  p-th 
user.  In  (8),  we  use  the  notation  a(n)  =  [aT  (nTc ),  ••• , 
aT  ( nTc  +(J  -  1)A)] T  ,  where  a  denotes  either  x,  h  or  n. 

III.  SYMBOL-LEVEL  PROCESSING 

1.  Chip-level  optimum  adaptive  processing 

For  the  consecutive  samples  during  the  period  of  M  chips 
( M>NC ),  we  form  the  following  vectors 

X  ( n )  =  |xr  (n),x  (n  - 1),-  •  • ,  x  (n  -M  +  l)f  ,  (9) 

Sp(n)  =  lip(n),pp(n-l),-,pp(n-M -Dp+1)]T , 

(10) 

N(n)  =  ^r(n),nr(/7-l),-”,nr(n-M  +1)]  .  (11) 

Define  the  following  Sylvester  convolution  matrix  of  user  p 
by  the  impulse  response  of  its  vector  channel, 

fc(0),hp0)....,h Tp(Dp)]\  as 

H<M)  — 

‘hp(0)  ...  h  p{Dp)  0  .  0 

0  !lP(0)  -  hp(Dp)  0  ».  0 

0  .  0  hp(0)  -  hp(Dp) 

(12) 

with  the  dimension  of  MNJx(M+Dp),  and  (8)  is  extended  to 
X(n)=  Spin) +  N  (n).  (13) 

p= ■ 

The  output  of  the  STAP  under  (13)  is  described  as, 

yin)=WTXin).  (14) 

Under  the  minimum  mean  square  error  (MMSE)  criterion 
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(22) 


min  E^fiPo(n-v)-y(n)\2 ,  (15) 

where  pPa  ( n )  is  the  training  chip  sequence  of  the  user  p0, 

which  is  considered  as  the  desired  user,  and  v  >  0  is  the 
delay  of  the  training  signal  selected  to  minimize  the  MMSE. 
The  optimum  weights  are  given  by  the  Wiener-Hopf 
equation  as 

<,c%=R*«V0(v),  (16) 

where 

Rx  =E[x(n)XH(n)]  (17) 

is  the  space-time  correlation  matrix,  and 

rPo  (v)  =  e[u*o  (n  -  v)X  («)]  (18) 

expresses  the  cross-correlation  vector  between  the  training 
signal  and  the  received  signal  vector.  It  is  seen  that  the 
complexity  of  the  chip-level  adaptive  filter  depends  on  the 
dimension  of  the  signal  vector,  i.e.,  the  dimension  of  the 
weight  vector  that  is  selected  based  on  the  length  of  the 
associated  channels. 

It  is  noted  that  in  CDMA  systems,  the  performance  of  the 
chip-level  processing  is  confined  to  the  number  of  the 
degrees  of  freedom  (DOF’s)  provided  by  the  employed 
array  and  the  cyclostationarity  of  the  users’  signals.  Such  a 
problem  can  be  mitigated  in  the  scheme  of  symbol-level 
processing,  where  the  MUAI  components  become  quasi¬ 
random  noises  after  despreading  with  the  signature  code  of 
the  desired  user. 

2.  Symbol-level  optimum  adaptive  processing 

Symbol-level  processing  is  so  called  that  symbol-duration 
spaced  taps  are  used  in  the  space-time  filter.  Similar  to  the 
oversampling-based  subchannel  formulation  as  made  in  (7), 
and  (8),  the  subchannel-based  signal  vector  after 
despreading  the  array  receiving  signals  with  the  signature 
code  of  the  desired  user  p0  is  denoted  as 

Xc(mNe )/£ Xs ( mNc  +  l)cpo  (l) ,  (19) 

1=0 

where 

X,  (/3)  =  [xT  (/5), xr  ()3  - 1),-  •  ■  ,xr  03  -  Nc  +  l)f  .  (20) 

By  stacking  K  consecutive-symbol  samples,  we  have  the 
space-time  signal  vector  as 

Xc(m)  =  [xTc  0 mNc),-,XTc  ((m-K  +  l)Ne)]T  (21) 
Let  M=KNC,  from  (19)-(21),  it  is  seen  that  Xc(m)  has  the 
same  form  as  (13).  This  implies 


Xc(m)  =  ^H^Sp(mNc+l)cPo(l) 

p= 1  1=0 

+  (mNc  +  /)cPo  (/). 

1=0 


It  is  seen  that 


Nfspo(mNc+l)cPa(l) 


has  KN.  +D„ 

C  Po 


components  that  are  the  consecutive  samples  of  the  single¬ 
path  despreading  signal  waveform  plotted  in  Fig.  1,  where 
the  peaks  are  the  desired  finger  outputs.  The  peak 


components  of  the  vector 


N£sp(mNc+l)cPo(l\p*p0 

1=0 


standing  for  the  MUAI’s  should  be  suppressed  because  they 
could  lead  to  false  fingers  in  the  situations  where  the  near- 
far  problem  exists.  When  there  is  no  near-far  problem,  they 
are  considered  as  quasi-random  noises.  The  symbol-level 
adaptive  processing  can  be  performed  based  on  (21),  i.e., 


yc (rn)  =WTXc(m)  =  £w[ ((m - l)Nc ) ,  (23) 
/=0 


where  IV  .  Similar  to  the  chip-level 

processing,  under  the  symbol-level  MMSE  criterion 


min  E\sPo(m-v)-yc(m)\2 , 

(24) 

the  optimum  weight  vector  is  obtained  as 

^ ^  p, symbol  ' 

(25) 

where 

R c=E[xc(m)X?(m)]  , 

(26) 

yp(V)=E[rjm-V)XAm)]  , 

(27) 

spg  ( m )  denotes  the  training  symbol  sequence  of  p0- th  user, 

and  v  is  selected  in  the  same  way  as  explained  in  (15). 

It  is  noted  that  the  above  filter  (25)  still  has  the  same 
complexity  as  that  given  in  (16). 


IV.  Sectorized  space-time 

ADAPTIVE  PROCESSING 


1.  Lower-rank  beamspace  transformation 

Lower-rank  beamspace  transformation  is  known  to  be  an 
effective  way  to  reduce  the  complexity  of  an  array 
processing  system.  Unlike  the  scheme  of  the  conventional 
beamforming,  here  we  consider  the  smoothing  subarray 
beamforming  illustrated  in  Fig.  2. 

Define  b  =  [bl,b2,-",bN_K]T  as  the  beamformer  vector, 
which  forms  a  beam  to  encompass  the  desired  signal  at  each 
of  the  k-\  subarrays  (  k<N ).  Then,  the  output  signal  vector 
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of  the  beamforming  in  Fig.  2  is  denoted  as 

\b(t)  =  'BT\(t)  (28) 

where  xb(t)  =  [xb](t),xb2(t),---,xbK(t)]T ,  and  the  beam- 
former  matrix  B  is  expressed  by 

'  bx  0  —  0 

:  bi  : 


b N-K  : 

0  b, n-k 


0 

b\ 


(29) 


0 


0 


Nx(k+1 ) 


Xbc(m)  =  [xTbc(mNc),---,XTbc((m-K+l)Nc)]T .  (40) 
Under  MMSE  criterion 

min  E\lpo(m-v)-ybc(mNc)\2 ,  (41) 

the  optimum  weight  vector  is  obtained  as 

<,Sector=RfcX6)(V)’  (42) 

where 

Rbc=E\xtc(mNc)X£(mNe)],  (43) 

f\v)  =  E[sl(m-v)Xhc(mNc)]  ,  (44) 


2  Sectorized  space-time  adaptive  processing 

The  sectorized  space-time  adaptive  processing  can  be 
performed  in  the  same  way  as  that  described  in  Section  II 
and  III  by  replacing  x(t)  with  xb  (t) .  Define 

zp(t)  =  BThp(t)  (30) 

and 

n6(/)  =  Brn(f)  (31) 

xh(iA  +  nTc)  =  BTx(iA  +  nTc),  i= (32) 

we  have 

P  D? 

**(«)  =  XZA lp{n-d)zp{d)+nb{n),  (33) 

p=ld=0 

where 

x„(«)  =  [x£  («7’c),-,x[ (nTc  +(J-  l)A)]r  ,  (34) 

zp  (n)  =  tp  (nTc ),■  ■  ■  ,zTp  ( nTc  +  (J  -  l)A)f  ,  (35) 

n»(«)  =  tl (nTc),-,nTb  ( nTc  +(J-  l)A)f  .  (36) 
By  stacking  the  Nc  consecutive  samples,  we  have 

X„in)  -  k  («).*!  (»  -  Dr (n  -  Nc  +1)]T  .  (37) 

The  symbol-level  vector  after  depsreading  the  output 
signals  vector  Xb{n )  can  be  denoted  as 


/=o 


Similar  to  (23),  the  symbol-level  sectorized  space-time 
adaptive  processing  can  be  performed  as 

yfrc (mNc )=WTXbc ( mNc )  =  X w[ Xbc((m-l)Nc) ,  (39) 

1=0 

where 


spo  (m)  and  v  are  of  the  same  meaning  as  that  in  (27), 
respectively. 

To  further  reduce  the  complexity,  we  can  use  only  the 
significant  components  over  a  threshold  within  the  vector 
Xbc(mNc),  as  is  commonly  implemented.  We  denote  it  as  the 
simplified  scheme.  The  results  of  the  simplified  scheme  are 
included  and  compared  at  the  computer  simulations. 

V.  COMPUTER  SIMULATIONS 

Computer  simulations  are  performed  to  confirm  the 
effectiveness  of  the  proposed  techniques.  In  these 
simulations,  an  eight-element  uniform  linear  array  with 
half-wavelength  spacing  is  used.  The  array  is  partitioned 
into  subarrays,  and  beams  are  formed  at  each  subarray.  For 
example,  the  beamformer  for  a  three-subarray  partitioning 
(six  array  sensors  at  each  subarray)  can  be  designed  as 

Jj  _  ^-yi  25«  e-y0.75O  e- J0.2SU  'gjOMu 

where  u  =  27rsin(0°)  and  9°  dictates  the  central  angle  of 
the  sector  where  the  spatial  rays  of  the  desired  user  signals 
are  located.  In  the  simulations,  18  CDMA  users’  signals  are 
considered  to  be  present,  where  user  1  is  considered  as  the 
desired  user.  The  code  length  of  all  the  users  is  127.  Each 
user  has  6  multipath  rays.  It  is  assumed  that  the  AOA’s  of 
the  paths  are  Gaussian  distributed  for  each  user,  and  their 
propagation  loss  and  time  delay  obey  the  Rayleigh  and  the 
exponential  distributions,  respectively.  Detailed  parameters 
for  the  desired  user  are  given  in  Table  1.  The  signal -to-noise 
ratio  (SNR)  of  the  direct  ray  of  the  user  1  is  assumed  as  - 
lOdB,  and  the  SNR’s  of  the  direct  rays  of  the  other  users  are 
randomly  chosen  from  -12.7  dB  to  -6.6  dB.  And  their 
nominal  AOA’s  are  uniformly  distributed.  The  central  angle 
of  the  given  sector  is  assumed  as  9°  =  12.3° . 

We  selected  K=2,  i.e.,  two  taps  for  the  symbol-level 
space-time  adaptive  processing.  The  steady  state  residual 
error  powers  of  the  normal  sectorized  STAP  and  its 
simplified  scheme  are  plotted  in  Fig.  3,  respectively,  where 
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the  number  of  subarrays  is  changed  from  one  to  four.  In  the 
simplified  scheme,  the  threshold  is  taken  as  1.8  times  the 
standard  deviation  of  the  components’  amplitudes  of  the 
signal  vector  Xbc(mNc).  The  residual  error  power  of  the 
element-space  STAP  is  -25.36dB,  which  is  considered  as 
the  bound  of  the  sectorized  processing  and  is  also  plotted  in 
Fig.  3.  It  is  clear  that  the  results  of  three-beam  and  four- 
beam  sector  STAP  are  close  to  the  bound,  whereas  the 
complexity  and  the  computational  burden  are  greatly 
reduced,  especially  for  the  simplified  scheme  with  the 
acceptable  performance  loss. 

VI.  CONCLUSIONS 

We  have  proposed  sectorized  STAP  techniques  for 
CDMA  systems,  which  provide  effective  sub-optimal  low- 
complexity  implementation  of  a  STAP  system.  Simulation 
results  show  close  performance  to  the  optimal  element- 
space  STAP  system. 
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Table  1  Parameters  of  the  desired  user 


No. 

■J&m 

K63S 

4 

1 

12.3 

0 

0.045  +  0.998i 

2 

7 

0.02 

0.93  -  0.206i 

3 

n.i 

0.32 

4 

14.2 

1.33 

mirnamM 

5 

11.6 

1.81 

0.355  -  0.264i 

6 

26.2 

1.82 

-0.264  +  0.034i 

Fig.  1  Single-path  despreading  waveform 


*|(0  *2(0  *3(0  xN.2(t)  xN.,(t)  xN(t) 


Fig.  2  Smoothing  subarray  beamforming 


Fig.  3  Residual  error  power 
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ABSTRACT 

Signals  modulated  by  M- ary  pulse  amplitude  modu¬ 
lation  (PAM)  or  M- ary  quadrature  amplitude  mod¬ 
ulation  (QAM)  have  certain  structured  constellation. 
When  the  communication  channel  introduces  inter-symbol 
interference  (ISI)  at  the  receiver  end,  demodulation  of 
such  signals  can  be  performed  by  constant  modulus  al¬ 
gorithm  (CMA)  based  equalizers  to  cancel  the  inter¬ 
ference.  However,  characteristics  of  modulated  signals 
are  only  partially  considered  in  the  CMA  cost  func¬ 
tion.  In  this  paper,  more  constraints  are  imposed  on 
the  equalized  signal  to  fully  capture  the  property  of 
the  modulated  signal  both  in  its  phase  and  amplitude. 
Observing  that  PAM  signals  are  uniformly  spaced  on 
the  x-axis  and  QAM  signals  in  two-dimensional  signal 
space,  the  property  of  transmitted  signals  from  each 
category  can  be  included  in  an  equivalent  determinis¬ 
tic  mathematical  description,  similar  to  the  constant 
modulus.  This  description  is  absorbed  in  our  modified 
cost  function,  resulting  in  a  simultaneous  minimization 
of  dispersion  relevant  to  signal’s  phase  and  amplitude. 
The  performance  of  the  equalizers  based  on  these  new 
algorithms  are  compared  with  the  CMA  equalizer. 

1.  INTRODUCTION 

In  different  wireless  applications,  different  modulation 
schemes  are  employed  to  meet  specific  resource  or  ser¬ 
vice  requirements.  Each  modulation  exhibits  its  own 
property.  Signals  by  M-ary  pulse  amplitude  modula¬ 
tion  (PAM)  or  M- ary  quadrature  amplitude  modula¬ 
tion  (QAM)  have  certain  structured  constellation.  For 
PAM  signals,  they  are  uniformly  spaced  in  the  real  axis 
(x-axis),  while  QAM  signals  are  uniformly  distributed 
in  a  2-dimensional  signal  space.  If  such  signals  are 
transmitted  through  a  multipath  channel,  signal  de¬ 
modulation  requires  an  equalizer  to  mitigate  the  chan¬ 
nel  distortion.  The  particular  source  characteristics  of¬ 


ten  facilitate  the  equalizer  design.  The  constant  mod¬ 
ulus  algorithm  (CMA)  based  equalizer  is  widely  used 
[7]  and  shows  its  unique  capability  in  equalizing  sig¬ 
nals  with  constant  modulus  property  [5].  It  was  first 
proposed  by  [3].  Extensive  studies  on  such  equalizers 
have  followed  [1],  [2],  [4].  The  algorithm  minimizes  the 
deviation  of  modulus  of  equalized  signal  from  a  con¬ 
stant.  The  satisfactory  performance  can  be  achieved 
especially  when  the  transmitted  signal  has  constant 
modulus  property. 

It  seems  that  the  knowledge  about  the  phase  of  the 
modulated  signal  is  dismissed  in  CMA.  However,  this 
knowledge  plays  an  equivalent  role  in  many  cases  in 
representing  a  signal.  It  can  be  expected  that  its  incor¬ 
poration  into  the  cost  function  will  improve  the  equal¬ 
ization  performance.  To  equalize  a  dispersive  chan¬ 
nel  (could  be  complex)  with  M- PAM  transmitted  sig¬ 
nals,  the  dispersion  in  the  distance  of  the  equalized 
signal  away  from  the  x-axis  should  also  be  minimized 
together  with  its  modulus  deviation.  Similarly,  when  a 
M- QAM  signals  are  transmitted,  it  is  not  sufficient  to 
consider  only  the  amplitude  of  the  equalized  signal  in 
a  2-dimensional  signal  space,  since  they  are  uniformly 
distributed  along  both  directions  which  are  perpendic¬ 
ular  to  each  other  and  parallel  to  two  axes.  Motivated 
by  CMA  algorithm,  we  will  design  new  equalizers  for 
these  two  kinds  of  modulated  signals  by  taking  into 
account  their  equally  spaced  property  in  our  new  cost 
function.  Similar  to  CMA  algorithm,  the  stochastic 
gradient  descent  methods  are  employed  to  update  our 
equalizers.  The  performance  of  the  equalizers  based 
on  these  new  algorithms  are  compared  with  the  CMA 
equalizer. 

2.  PROBLEM  STATEMENT 

In  wireless  communications,  the  multipath  channel  in¬ 
troduces  inter-symbol  interference  (ISI)  in  the  received 
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signal  x  G  Cn  [4] 


x  =  Hs  +  w  (1) 

where  s  G  Cm  is  the  complex  source  vector  from  ei¬ 
ther  M-PAM  or  M- QAM  constellation,  if  G  Cpxm 
is  a  complex  channel  matrix,  w  G  Cp  represents  ad¬ 
ditive  white  Gaussian  noise  (AWGN),  and  x  G  Cp  is 
the  received  signal  vector.  To  detect  the  signal  s(l), 
an  equalizer  /  G  Cp  is  designed.  Its  output  y  can  be 
written  as 

y  =  fHx  =  aTs  +  fHw  (2) 

where  superscripts  (-)T,  (-)H  stand  for  transpose  and 
Hermitian  respectively,  aT  =  f  H  H  is  the  compos¬ 
ite  response  of  the  channel  and  the  equalizer.  Perfect 
equalization  can  be  achieved  in  the  absence  of  noise  if 
the  equalizer  can  compensate  the  channel  in  such  a  way 
that  a  has  only  one  non-zero  element  [6] 

a  =  eJ0  [0,  •  •  • ,  0, 1, 0,  •  •  • ,  0]T  '  (3) 

Therefore  the  output  will  be  a  delayed  input  with  some 
phase  shift.  ISI  is  completely  eliminated  in  the  absence 
of  noise.  Different  criteria  can  be  used  to  seek  perfect 
equalization.  In  CMA  criterion,  the  dispersion  of  the 
modulus  of  equalizer  output  about  a  constant  is  mini¬ 
mized 

Jc(f)=E{(\y\2~r0f}  (4) 

where  “E”  represents  expectation,  r0  =  The 

algorithm  is  usually  implemented  by  stochastic  gradi¬ 
ent  descent  method 

f(k  +  1)  =  f(k )  -  ti(\y(k)\ 2  -  r0)y*{k)x(k)  (5) 

where  *  represents  conjugate.  It  is  clear  that  the  mod¬ 
ulus  characteristic  is  captured  and  employed.  However, 
most  modulated  signals  possesses  properties  in  both 
amplitudes  and  phase.  The  M- PAM  or  M- QAM  sig¬ 
nals  take  discrete  values  from  a  set  whose  elements  lie 
on  the  x-axis  or  a  2-dimensional  signal  space  uniformly. 
Motivated  by  CMA  criterion,  we  will  derive  a  new  cost 
function  to  incorporate  this  information  and  develop  a 
corresponding  algorithm  to  obtain  the  equalizer  next. 

3.  PROPOSED  EQUALIZERS 

Let  us  first  review  the  representations  and  properties 
of  PAM  and  QAM  signals.  The  PAM  signals  are  one 
dimensional  in  the  sense  that  they  are  real  and  uni¬ 
formly  distributed  on  the  real  axis.  The  QAM  signals 
are  complex  and  uniformly  spaced  in  directions  of  real 
axis  and  imaginary  axis.  Due  to  this  similarity,  the 


properties  of  QAM  can  be  easily  found  once  the  prop¬ 
erties  of  PAM  signals  are  explored.  For  a  general  dis¬ 
cussion,  the  multipath  channel  and  the  equalizer  are 
assumed  to  be  complex  for  both  cases.  We  start  with 
the  equalization  of  PAM  signals. 

3.1.  PAM  signals 

M- ary  PAM  signals  can  be  represented  by  the  following 

sm  =  (2m  -  1  —  M)d,  m  =  1,  •  •  • ,  M 

where  m  is  a  random  number.  Usually  M  is  an  even 
integer  and  can  be  written  as  M  =  2 L.  These  PAM 
signals  can  also  be  expressed  by 

sm  =  (2m  -  1  )d,  m  =  -L,  ■  ■  ■  ,L  (6) 

if  we  define  a  new  variable  m  =  m  -  L.  We  will  adopt 
this  signal  description  later.  In  (6),  m  can  only  take 
integers  from  —  L  to  L  which  can  be  expressed  by  sm 
as:  m  =  Sr^~  ■  In  the  current  context,  this  constraint 
is  equivalent  to  sin(rmr)  =  0.  Thus  it  requires 

«"(  2d  -7r  =  cos(  ’^7r)  =  0  (7) 

The  transformation  from  (6)  to  (7)  is  essential  in  con¬ 
structing  our  cost  function.  The  other  property  of  sm 
is  that  it  has  phase  equal  to  a  multiple  of  n  because  sm 
lies  on  the  real  axis.  Therefore 

sin(p  =  0  (8) 

where  rf>  is  the  phase  of  sm.  Taking  into  account  the 
complex  equalized  signal,  we  can  combine  (4),  (7)  and 

(8)  in  one  cost  function 

Ji(f)  =  E{{\y\ 2  -  r0)2  +  aicos2(j^j7r)  +  a2sm2(<£)} 

(9) 

where  aq  and  a2  are  weighting  factors,  y  is  the  equal¬ 
ized  signal  given  by  (2),  cj>  is  its  phase.  In  (9),  y  and 
<fr  are  functions  of  our  equalizer  f.  Therefore  J\  (/)  is 
a  highly  non-linear  function  of  /  and  difficult  to  mini¬ 
mize.  Similar  to  CMA  algorithm,  we  update  the  equal¬ 
izer  according  to  gradient  descent  method 

/(fe  +  l)  =  /(fc)-/x1VJ1(/)|/=/(fc)  (10) 

The  derivative  of  J\  (/)  with  respect  to  fH  is  required 
in  (10).  It  can  be  derived  term  by  term  from  the  RHS  of 

(9) .  The  first  term  is  directly  from  CMA.  Its  derivative 
can  be  easily  found  to  be 

(E{(\y\2  -  r0)2})'f  =  2£{(|y|2  -  r0)y*x}  (11) 
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For  the  second  term,  its  derivative  can  be  computed 
once  derivatives  of  \y\  and  <j>  are  obtained.  If  we  ex¬ 
press  | y |  by  \/yy*,  then  the  derivative  of  |?/|  is  easily 
computed  to  be 

(M)/  =  (12) 


For  0,  it  can  be  expressed  by  /  as 


0  =  arctan 


y-y* 
j(y  +  y*) 


=  arctan 


fHx-xHf 
j(fHx  +  XHf) 


(13) 


Therefore  the  derivative  of  0  can  be  shown  to  be 


xH  f  _  1 

^j\y?x  ~  2jyx 


(14) 


According  to  (9),  (11),  (12)  and  (14),  the  derivative  of 
VJi(/)  is  obtained  as 


VJi(/)  =  E{(3x}  (15) 


where 


P  =  2(l2/|2  ~r0)y*  +  a2 


sin(20) 

2  jy 


Try*  sin(^f-) 


Therefore  the  stochastic  gradient  algorithm  for  the  equal¬ 
izer  follows 


f(k  +  l)  =  f(k)-»1px  (16) 

3.2.  QAM  signals 

There  are  some  similarities  between  PAM  and  QAM 
signals.  In  the  signal  space  QAM  signals  can  be  de¬ 
picted  by  ( sx,sv )  where 


To  compute  VJ2(f),  we  first  evaluate  derivatives  of  yi 
and  ?/2.  If  they  are  expressed  explicitly  by  /, 

y  +  y*  fHx  +  xHf 

Vl  ~  2  2 

_  y-y*  _  fHn  -  xHf 

V2  ~  2  j  2  j 

then  it  is  easy  to  show  that  their  derivatives  have  the 
form 

(*)/=!•  w'/=§ 

Based  on  these  results  and  (20),  VJ2(/)  can  be  derived 
to  be 

VJ2(/)  =  -E{Vx}  (22) 

where 

nsin(^w)  nsin(^n) 

11  2  dx  +  2jdy 

In  the  case  dx  =  dy  =  1,  r)  is  simplified  to 

7 r 

V  =  -  [sin{ym)  -  jsin(y2n)] 

Substituting  (22)  in  (21)  and  using  instantaneous  ap¬ 
proximation,  we  can  update  the  equalizer  according  to 

f(k  +  1)  =  f{k)  +  fi2r)X  (23) 

The  equalization  method  proposed  for  either  PAM 
source  or  QAM  source  in  this  section  explicitly  con¬ 
siders  the  phase  and  modulus  properties  of  the  trans¬ 
mitted  signals.  As  a  result,  superior  performance  is 
expected  compared  with  the  conventional  CMA  equal¬ 
izer  which  only  captures  the  modulus  property. 

4.  SIMULATIONS 


Sa,  —  (2mx  1  )dx,  rrix  —  •  ■  • , Lx  (-IT) 

Sy  =  {2jTly  1  )dy,  77Jy  =  Ly,  *  *  ’  ,  Ly  (18) 

This  representation  can  be  transformed  into  (see  (7)) 
=  0,  cos(^tt)  =  0  (19) 

Therefore  we  can  build  the  following  cost  function 

Mf)  =  £{cos2(|^7r)  +  cos2(^7r)}  (20) 

with  yi  and  y2  to  be  real  and  imaginary  parts  of  y 
respectively.  The  gradient  descent  recursion  for  the 
equalizer  can  be  formulated  as 

/(*  +  !)  =  /(*)-  H2  VJ2(/)|/=/(ife)  (21) 


In  this  section  we  provide  some  simulation  examples 
to  demonstrate  the  applicability  of  the  proposed  PAM 
and  QAM  equalization  methods.  We  also  compare 
them  with  the  CMA  algorithm  [3]  respectively  based 
on  inter-symbol  interference(ISI)  and  the  error  prob¬ 
ability.  The  ISI  is  used  to  illustrate  the  convergence 
property  of  the  algorithm  and  defined  as 

jgj  _  Si  Ijfyj  ~ 

\a\max 

where  aT  =  fH H,  \a\max  is  the  largest  absolute  value 
of  all  elements  in  a.  Under  perfect  equalization,  a  has 
only  one  nonzero  component  as  in  (4).  Then  ISI  be¬ 
comes  zero.  Therefore,  small  ISI  indicates  the  prox¬ 
imity  to  the  desired  response.  To  gain  more  insight 
about  the  performance  of  the  methods  in  the  commu¬ 
nications  context,  we  also  adopt  error  probability  as 
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the  other  measure.  It  is  defined  as  the  percentage  of 
accumulated  decoding  errors  among  total  number  of 
transmitted  symbols  up  to  the  current  iteration,  and 
obtained  from  multiple  independent  realizations  with 
random  input  signals. 

In  the  experiments,  we  consider  an  unknown  non¬ 
minimum  phase  channel  impulse  response  used  in  [6] 
with  the  first  4  coefficients  [-0.400  0.840  0.336  0.134]. 
The  equalizer  has  12  taps  and  the  initial  value  of  all 
zeros  [0,  •  ■  • ,  0, 1, 0,  •  •  • ,  0]T  except  that  the  seventh  el¬ 
ement  is  1.  5000  iterations  are  run  in  each  realization. 
Totally  50  independent  realizations  are  performed  to 
obtain  the  average  results. 

First,  we  compare  the  proposed  PAM  equalizer  with 
Godard  approach  [3]  with  the  PAM  source.  The  input 
signals  take  six  equi-probable  values:  {±0.1,  ±0.3,  ±0.5} 
The  step  size  p  is  set  to  be  0.085,  weighting  factors 
a\  =  0.005  and  a.^  =  0  (since  a  real  channel  is  used). 
The  first  500  iterations  are  used  for  initialization  for 
both  methods.  The  average  ISI  after  500  iterations  is 
plotted  in  Fig.  1.  The  solid  line  represents  the  pro¬ 
posed  PAM  method  while  the  dashed  line  for  CMA.  It 
is  observed  that  the  ISI  of  the  proposed  PAM  method 
converges  to  a  level  15 dB  lower  than  that  from  CMA 
while  maintaining  the  same  fast  convergence.  The  error 
probability  is  also  shown  by  Fig.  2.  In  fact,  based  on 
our  observation,  the  proposed  method  doesn’t  take  any 
error  after  convergence  (800  iterations),  while  CMA 
still  accumulates  some  errors. 

Our  second  experiment  considers  QAM  source  with 
4  equi-probable  values  {±1  ±  j}.  We  also  compares  the 
proposed  QAM  scheme  with  the  CMA  algorithm  [3], 
The  first  20  data  points  are  used  for  initialization  for 
both  methods.  The  average  ISI  and  error  probability 
after  20  iterations  are  plotted  in  Fig.  3  and  Fig.  4 
respectively.  Solid  lines  represent  the  proposed  QAM 
equalization  method  while  dashed  lines  for  CMA.  It  is 
seen  that  the  ISI  based  on  the  proposed  QAM  scheme 
converges  faster  than  that  of  the  standard  CMA  while 
achieving  a  much  lower  level  after  convergence.  The 
error  probability  of  the  proposed  method  is  also  much 
lower  than  that  of  CMA.  This  fact  can  be  reflected  by 
the  difference  in  constellation  diagrams  of  the  equal¬ 
ized  outputs  for  all  iterations  from  a  randomly-picked 
realization,  as  shown  in  Fig.  5  and  Fig.  6.  It  is  interest¬ 
ing  to  note  that  the  equalized  outputs  of  our  equalizer 
has  a  much  smaller  variation  than  that  of  the  CMA 
equalizer. 
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Figure  2:  Error  probability  of  the  proposed  method  Figure  5:  Equalized  output  of  the  proposed  method 
and  Godard’s  method  with  PAM  sources.  with  QAM  sources. 


Figure  3:  ISI  of  the  prposed  method  and  Godard’s 
method  with  QAM  sources. 


Figure  6:  Equalized  output  of  Godard’s  method  with 
QAM  sources. 


Figure  1:  ISI  of  the  proposed  method  and  Godard’s  Figure  4:  Error  probability  of  the  proposed  method 
method  with  PAM  sources.  and  Godard’s  method  with  QAM  sources. 
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ABSTRACT 

Substantial  power  efficiency  improvements  are  possi¬ 
ble  in  communication  systems  if  a  moderate  amount  of 
nonlinearity  is  permitted  at  the  transmitter  amplifier 
and  corrected  for  at  the  receiver.  The  Volterra  series 
is  a  suitable  model  for  many  power  amplifiers,  and  is 
readily  incorporated  into  communication  channel  mod¬ 
els.  Existing  fixed  point  equalization  algorithms  for 
Volterra  channels  place  restrictive  conditions  on  the  lo¬ 
cations  of  first-order  kernel  zeros.  We  show  that  multi¬ 
channel  and  block  based  precoding  linear  equalization 
techniques  can  be  combined  with  the  fixed  point  equal¬ 
izer  to  allow  for  exact  equalization  of  Volterra  systems 
with  mixed-phase  first-order  kernels. 

1.  INTRODUCTION 

The  design  of  a  communication  system,  from  the  data 
format  to  the  tranceivers,  is  composed  of  many  parts. 
Radio  frequency  power  amplifier  design  is  an  important 
component  of  cellular,  television,  radio,  and  data  trans¬ 
mission  systems.  In  amplifier  design  the  requirements 
of  power  efficiency  and  linearity  can  be  at  odds  with 
each  other,  with  the  result  being  that  power  efficiency 
is  sacrificed  in  order  to  meet  linearity  requirements  [2]. 

Substantial  efficiency  improvements  can  be  possible 
if  some  mild  nonlinearity  is  allowed  in  the  transmitter 
amplifier  and  corrected  for  at  the  receiver.  This  im¬ 
proved  efficiency  translates  to  lower  operating  costs, 
longer  battery  life,  and  smaller  size  devices.  A  penalty 
of  allowing  additional  nonlinearity  into  the  system  is 
that  the  equalizer  must  now  compensate  for  a  nonlin¬ 
ear  channel. 

In  this  paper  we  consider  fixed  point  equalization 
of  communication  channels  modeled  by  the  Volterra 
series  [3],  [4],  [8].  Fixed  point  equalization  in  this  case 

This  work  was  supported  in  part  by  NASA  grant  NGT- 
352334  and  NSF  grant  MIP-9703312. 


refers  to  the  contraction  mapping  theorem  [3]  (not  inte¬ 
ger  arithmetic).  The  Volterra  series  is  a  useful  nonlin¬ 
ear  model  for  amplifiers  [2],  and  is  readily  incorporated 
into  the  overall  channel  model  as  an  extension  of  linear 
convolution. 

Drawbacks  of  traditional  fixed  point  equalization 
techniques  include  the  requirement  that  the  linear  com¬ 
ponent  of  the  channel  is  minimum-phase  (for  stable  ex¬ 
act  inverses)  [3]  or  its  zeros  are  not  near  the  unit  circle 
(for  approximate  inverses)  [4].  These  can  be  serious 
limitations  for  realistic  communication  channel  mod¬ 
els,  as  the  error  in  the  inversion  of  the  linear  channel 
component  is  iterated  on  by  the  fixed  point  algorithm. 

Recently,  multichannel  [7]  and  block  based  precod¬ 
ing  methods  [5]  have  become  popular  for  linear  channel 
equalization.  This  is  because  both  methods  convert  the 
ill-posed  single  channel  inversion  problem  into  a  well 
posed  problem  with  an  exact  (zero  forcing)  solution  in 
the  noise-free  case.  We  show  that  these  principles  can 
be  combined  with  the  fixed  point  equalizer,  for  zero 
forcing  equalization  of  nonlinear  channels  with  mixed- 
phase  first-order  kernels. 

2.  THE  VOLTERRA  SERIES 

For  the  discrete  Jth-order  Volterra  system  H,  the  input 
x(n)  is  related  to  the  output  y(n)  by: 

y(n) 

=  H(x(n), . . .  ,x(n  —  L)) 
j 

=  ^Hj{x(n),...,x{n~  Lj)) 
j=  1 

J  Lj  Lj  j 

=  53  53"'  53  M*,..,ri)n.(n-r.)l 

j=  1  Tl=0  Tj—Tj  —  1  0=1 

where  Hj  is  the  jth-order  operator  of  H,  hj(j\ , . . .  ,Tj) 
is  the  nonredundant  region  of  the  jth-order  kernel,  and 
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L  —  max{Li , . . . ,  L  j  } .  Notice  that  a  first-order  Volterra 
system  (J  =  1)  is  linear  convolution  (an  FIR  filter). 

Throughout  this  paper  the  symbol  u(n)  will  be  used 
to  refer  to  the  linear  component  of  the  Volterra  system 
with  additive  noise  v(n): 

Li 

u(n )  =  ^2  hi{T\)x(n  -  ri)  +  v(n). 

Ti=0 

For  the  channel  input  x(n),  output  y(n),  noise  v(n), 
and  linear  portion  of  the  output  with  noise  u(n),  it  will 
be  assumed  that  these  vectors  are  composed  of  a  basic 
block  of  N  symbols,  and  a  subscript  will  indicate  how 
many  symbols  before  this  basic  block  to  include,  e.g.: 

Xl  =  [x{-L),...,x(N  -  1)]T. 

An  optional  argument  can  be  included  to  specify  a  sub¬ 
set  of  the  vector: 

xl (a  :  b)  =  [x(a),...,a;(6)]T. 

If  the  d  sample  delay  operator  z~d  is  placed  before  the 
vector,  then  each  element  of  the  vector  is  delayed  by  d 
samples: 

z~dXL  =  [x(-L  —  d), . . . ,  x(N  —  1  —  d)]1 . 

We  define  the  Volterra  series  relationship  between 
an  input  vector  xl  and  output  vector  y0  as: 

y0  =  H(xL). 


v(n) 


Figure  1:  A  single  channel  Volterra  system. 


Volterra  channels  is  setting  up  a  fixed  point  equation 
for  the  input  in  terms  of  the  known  system  kernels  and 
system  output,  then  solving  for  the  input  using  the 
method  of  successive  approximations  [3].  Two  assump¬ 
tions  are  implicit: 

Assumption  (Al):  The  K  +  L  previous  input  sym¬ 
bols  x(-K-L), . . . ,  z(-l)  have  already  been  estimated. 

Assumption  (A2):  The  K  previous  output  samples 
y(—K),  ...,«/(— 1)  are  available. 

In  the  following  derivation,  even  though  xo  will  be 
on  the  on  the  left  hand  side  of  the  equation  and  xk+l 
will  be  on  the  right  hand  side,  there  is  still  a  fixed  point 
equation  in  xo  since  xk+l  can  be  formed  directly  from 
x0. 

The  derivation  of  the  fixed  point  equalizer  is  well 
known  in  the  literature  [3].  Here  the  derivation  is  per¬ 
formed  using  the  notation  of  Section  2  which  will  em¬ 
phasize  the  importance  of  the  inversion  of  the  linear 
component  of  the  noisy  channel  output. 

The  input/output  relationship  for  the  single  chan¬ 
nel  Volterra  system  in  Fig.  1  with  additive  noise  at  the 
receiver  is: 


As  a  shorthand  notation  to  refer  to  the  output  of  spe¬ 
cific  order  operators,  we  define: 


b 

IWxl)  =  X>,(xLi)- 

j=a 

It  is  often  necessary  to  write  the  first-order  operator 
corresponding  to  a  finite  impulse  response  (FIR)  filter 
as  a  filtering  matrix.  For  the  length  <5  +  1  vector  c  = 
[c(Q), . . .  ,c(0)]T,  the  N  xN  +  Q  filtering  matrix  Tn{c ) 
is  defined  as: 


75v(c) 


'  c(Q)  ■  ■  •  c(0) 

C(Q) 


c(0) 


3.  FIXED  POINT  EQUALIZATION 


y0  =  Hat  xLl  +  H2:J(xl)  +  v0,  (1) 

where  Hn  =  Tn  (hi ) .  Rearranging  the  terms  and  ap¬ 
plying  the  linear  operator  Gs  with  memory  K  to  both 
sides  yields: 

Gs(Hk+n  xzc+l,  +  v/f)  =  Gs(yk)  -  GsH2..j(xk+l). 

(2) 

Notice  that  each  of  the  vectors  and  matrices  from  (1)  to 
(2)  has  been  extended  by  K  samples  in  the  past  (avail¬ 
able  from  (Al)  and  (A2))  since  the  operator  Gs  has 
memory  K.  To  setup  the  desired  fixed  point  equation, 
it  is  necessary  to  make  the  left  hand  side  of  (2)  z~dxo- 
Define  the  single  channel  error  term  as 

es  =  z~dx0  -  Gs{ UK+N  xK+Ll  +  vK) 

It  is  common  to  choose  Gs  corresponding  to  a  causal 
Kth  order  FIR  filter 

gs  =  [&(#),...,  Ss(0)]T, 


In  this  section  we  review  the  single  channel  fixed  point 
equalizer  based  on  the  contraction  mapping  theorem. 
The  basic  idea  underlying  fixed  point  equalization  of 


designed  according  to  the  minimum  mean-square  er¬ 
ror  (MMSE)  criterion.  For  the  MMSE  equalizer  it  is 
necessary  to  make  the  following  assumption: 
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Assumption  (A3):  The  input  x(n)  and  the  noise 
v(n)  are  mutually  uncorrelated,  stationary  random  pro¬ 
cesses  with  known  covariance  matrices: 

Rxz  =  E[xk+l j  (n  —  K  —  Li  :  n) 

*k+lM-  K  ~  L  i  :n)l 

=  E[vk(ti- K  :n)  vx(n- K  :n)], 
respectively. 

If  (Al)  -  (A3)  are  satisfied,  then  the  equalizer  gs 
can  be  solved  for  as: 

gs  =  R-uJ  rxu,  (3) 

where  Ruu  and  rxu  are  defined  as 

H"+1+R 

vv 

rxu  -  H*K+1E[x(n  -  d)x*K+Li  (n  -  K  -  Lx  :  n)\. 

Substitution  of  the  operator  Gs  associated  with  the 
filter  gs  into  (2)  results  in  the  fixed  point  equation: 

z~dx0  =  Gs( yK)  ~  GsH2:J{xK+L)  +  e9. 

Assuming  that  es  is  small,  it  is  ignored  and  the  approx¬ 
imate  fixed  point  equation  is  solved: 

z~dx0  =  Gs(yK)  -  GsH2..j(xk+l)- 

For  the  case  of  d  =  0,  xk+l  can  be  determined  from 
z~dx o  and  (Al).  However,  when  d  >  0,  it  is  not  possi¬ 
ble  to  determine  the  last  d  elements  of  x«+ l  ,  namely 
x(N  -  d), . . . ,  x(N  —  1).  To  obtain  proper  estimates  of 
these  last  d  symbols  in  z~dx0,  they  could  be  the  first 
symbols  estimated  in  the  next  block  of  data. 

A  drawback  of  the  fixed  point  equalizer  is  the  er¬ 
ror  introduced  into  the  fixed  point  equation  associated 
with  the  inverse  of  the  first-order  kernel.  The  error  de¬ 
pends  on  the  length  K,  delay  d,  and  the  location  of  the 
zeros  of  H\ .  The  fixed  point  equalizers  in  the  following 
two  sections  eliminate  this  source  of  error,  and  allow 
for  zero  forcing  equalization  of  the  linear  component 
(along  with  the  nonlinear  component)  of  the  channel 
in  the  noise-free  case. 

4.  MULTICHANNEL  FIXED  POINT 
EQUALIZER 

The  availability  of  multiple  observations  per  symbol  pe¬ 
riod  at  the  receiver  has  become  more  common  in  many 
communication  systems.  Using  a  superscript  ^  to  de¬ 
note  the  channel,  the  following  assumption  is  made: 

Assumption  (A4):  There  are  no  common  zeros  across 
all  of  the  linear  components  {R^}f=1  of  the  channels. 


i/^(n) 


Figure  2:  A  single-input /multiple-output  Volterra  sys¬ 
tem. 


It  is  well  known  that  for  multiple  linear  channels, 
FIR  zero  forcing  equalization  is  possible  if  (A4)  is  sat¬ 
isfied  [7].  In  this  section  it  is  shown  that  these  linear 
multichannel  equalization  techniques  can  be  combined 
with  the  fixed  point  equalizer,  to  allow  for  zero  forc¬ 
ing  equalization  of  Volterra  channels  with  mixed-phase 
first-order  kernels  using  as  little  as  two  channels. 

Consider  the  multichannel  Volterra  system  shown 
in  Fig.  2  and  again  assume  that  (Al)  and  (A2)  are 
satisfied.  For  the  sth  channel  write: 

y{oS)  =  H  «Xi,  +  #2  -j(xL)  +  v(oS)> 

where  =  7)i(h|8)).  Rearranging  terms  and  apply¬ 
ing  the  linear  operator  Gm  -with  memory  K  to  both 
sides  yields: 

=  G^(y^)-G^H^(xK+L).  (4) 

Because  (4)  holds  for  each  channel  s,  it  is  possible  to 
sum  the  results  for  all  S  channels  and  write: 

^GW(hWnx,+1i+vW)  = 

8—1 

-  EGm^f](XiC+L).  (5) 
8=1  8=1 

If  it  is  possible  to  make  the  left  hand  side  of  (5)  xo, 
then  the  result  will  be  the  desired  fixed  point  equation. 
Define  the  error  term  as 
s 

£m  =  X0  -EGm(H {k)+N^K+L1  +  V^). 

8=1 

If  (A4)  is  satisfied,  then  in  the  noise-free  case  a  Ath- 
order  FIR  zero  forcing  solution  exists  such  that 
s 

£Gm(  ^U+b)=X0, 

8=1 
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v(n) 


Figure  3:  A  single  channel  Volterra  system  with  pre¬ 
coder. 


provided  that  S(K  + 1)  >  K  +  L  +  l  [7].  Define  the  mul¬ 
tichannel  filtering  matrix  Hmijc+i  and  the  multichan¬ 
nel  A' th- order  equalizer  gm  corresponding  to  {Gm  }j=1 
as: 


6m 


(S),tiT 

K+l 
(S),T]T 


The  zero  forcing  equalizer  can  be  recovered  as  [7]: 

gm  =  (H  m./C+r)1  *K+Li+ 1,  (6) 

where  e/c+z-i+i  is  a  (K  +  L\  +  1)  x  1  vector  with  a  one 
in  the  (K  +  Li  +  l)th  position  and  zeros  elsewhere. 

Substitution  of  the  operator  Gm  associated  with 
the  filter  gm*  designed  according  to  (6)  into  (5)  results 
in  the  fixed  point  equation: 


s  s 

*0  =  EG*(y(x)  -Y,G^H^j(xk+l)  +  sm 

S=  1  «=1 


5.  BLOCK  BASED  PRECODING  FIXED 
POINT  EQUALIZATION 

As  an  alternative  to  using  multiple  channels  at  the 
receiver  to  improve  the  single  channel  inversion  prob¬ 
lem,  structured  redundancy  could  be  introduced  at  the 
transmitter.  By  block  precoding  at  the  transmitter 
and  block  equalization  at  the  receiver,  FIR  zero  forcing 
equalization  of  single  channel  systems  is  possible  irre¬ 
spective  of  the  location  of  channel  zeros  [5].  As  in  the 
multichannel  case,  these  properties  can  be  extended  to 
fixed  point  equalization  of  Volterra  channels. 

Consider  the  block-based  transmission  scheme  of 
Fig.  3.  At  the  transmitter,  data  symbols  w(n)  are  col¬ 
lected  into  a  block  of  length  M : 

w  =  [w(0 w(M  -  1)]T, 

and  mapped  by  the  precoder  Fp  to  the  length  N  block 
of  channel  inputs  xo-  If  the  precoder  is  linear,  then 
it  can  be  represented  by  the  N  x  M  matrix  Fp.  The 
precoder  structure  is  chosen  to  satisfy  the  following  two 
assumptions  [5]: 


Assumption  (A5):  The  lengths  L,  M,  and  N  satisfy 
N  =  L  +  M. 

Assumption  (A6):  rank(Fp)  =  M,  and  the  last  L 
rows  of  Fp  are  zero. 

As  a  result  of  (A6),  Fp  can  be  decomposed  as 


FP  = 


FP 

0  LxM 


where  the  M  x  M  matrix  Fp  is  nonsingular.  Using 
(A6)  it  is  possible  to  write: 


XL  = 


Olxi 

Fpw 

OlxI 


The  N  row  filtering  matrix  for  the  first-order  kernel 
H„  =  Tn(  hi)  can  be  decomposed  as 

HN  =  [nN  Hjv  Hjv]> 

where  Hjv  is  N  x  L,  Hjv  is  N  x  M,  and  Hjv  is  N  x  L. 
Using  these  definitions,  the  input/output  relationship 
for  the  block-based  system  with  precoding  can  be  writ¬ 
ten  as 


y0  =  H/vFpw  + 


Olxi 

Fpw 

Olxi 


)  +  v0. 


Rearranging  terms  and  applying  the  linear  operator  Gp 
to  both  sides  yields: 


Gp(HjvFpw  +  v0)  =  Gp(y0)  -  GpH2:j( 


Olxi 

Fpw 

Olxi 


)• 


(7) 

If  the  left  hand  side  of  (7)  was  w,  then  the  desired 
fixed  point  equation  would  result.  Define  the  error  term 
as 


eP  =  w  —  Gp(HjvFpw  +  v0).  (8) 


If  (A5)  and  (A6)  are  satisfied,  then  in  the  noise-free 
case,  a  zero  forcing  solution  Gp  (with  matrix  form  Gp) 
to  (8)  exists  such  that  [5]: 


GpHjvFpw  =  w. 


The  zero  forcing  equalizer  can  be  recovered  as  [5]: 

Gp  =  F- 1Hjv.  (9) 


Substitution  of  the  operator  Gp  associated  with  the 
matrix  Gp  designed  according  to  (9)  into  (7)  results  in 
the  fixed  point  equation: 


w  =  Gp(y0)  -GpH2:j{ 


Olxi 

Fpw 

Olxi 


)  +  £p- 
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6.  SIMULATIONS 


(a)  Single  Channel 


We  considered  a  third-order  baseband  Volterra  system 
with  L\  —  5  and  1/3  =  2,  whose  complex  kernel  coeffi¬ 
cient’s  real  and  imaginary  parts  were  chosen  randomly 
from  [-0.5, 0.5],  with  the  third-order  kernel  scaled  by 
0.03  such  that  the  nonlinear  to  linear  power  ratio  is  - 
23  dB.  A  16-QAM  input  was  used,  and  additive  white 
Gaussian  noise  was  present  at  the  channel  output.  For 
each  data  point  we  generated  100  blocks  of  N  —  100 
symbols  for  100  different  channels. 

For  the  multichannel  fixed  point  simulations  we  used 
5  =  4  channels  and  the  linear  component  of  the  equal¬ 
izer  designed  according  to  (6)  with  order  K  =  8.  The 
single  channel  fixed  point  simulations  (with  and  with¬ 
out  precoding)  used  the  first  of  the  multichannel  fixed 
point  simulations’  channels.  The  standard  single  chan¬ 
nel  fixed  point  equalizer’s  linear  component  was  de¬ 
signed  according  to  (3)  with  K  =  32  and  d  =  16. 
The  linear  component  of  the  single  channel  fixed  point 
equalizer  with  precoding  was  designed  according  to  (9), 
with  a  data  block  length  of  M  =  N  —  L  =  95  and  pre¬ 
coder  F  =  I mxm-  For  each  of  the  fixed  point  equaliz¬ 
ers,  5  iterations  of  their  respective  fixed  point  equation 
were  performed. 

For  our  performance  metric,  we  calculate  the  signal 
to  interference  ratio  (SIR),  defined  in  terms  of  the  MSE 
of  the  equalizer  output: 

SIR  =  — 10  log10  MSE  (dB), 

vs.  SNR.  The  SIR  allows  us  to  assess  the  ability  of  the 
equalizers  to  cope  with  both  the  noise  and  the  nonlin¬ 
earity.  Fig.  4  compares  the  output  of  each  of  the  fixed 
point  equalizers,  along  with  the  corresponding  outputs 
of  the  linear  components  of  the  equalizers. 

7.  CONCLUSIONS 

In  this  paper  we  showed  that  multichannel  and  block 
based  precoding  linear  channel  equalization  techniques 
can  be  combined  with  the  fixed  point  method  for  zero 
forcing  equalization  of  Volterra  channels  with  mixed- 
phase  first-order  kernels.  Since  the  fixed  point  equalizer 
takes  the  form  of  a  nonlinear  correction  added  to  a 
linear  inverse,  it  is  a  practical  addition  to  existing  linear 
channel  equalization  schemes. 
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ABSTRACT 

A  joint  propagation  parameter  estimation  method  for  Mul- 
tiCarrier  systems  is  proposed.  The  main  difference  between 
Single  Carrier  and  MultiCarrier  models  is  outlined  and  han¬ 
dled  in  the  derivation  of  the  algorithm.  The  method  uses  a 
subspace-based  2-D  ESPRIT-like  approach,  exploiting  fre¬ 
quency  shift  invariance  of  the  system  as  well  as  the  ULA 
geometry  to  provide  closed-form  estimation.  Basic  perfor¬ 
mances  of  the  algorithm  are  illustrated  through  simulations 
and  compared  with  respect  to  the  Cramer-Rao  bound. 

1.  INTRODUCTION 

In  several  wireless  systems,  the  transmitted  signals  are  sub¬ 
ject  to  the  effects  of  multipath  channels,  caused  by  the 
remote  terrestrial  objects  and  the  inhomogeneities  in  the 
physical  medium.  Estimation  of  the  multipath  propagation 
parameters  from  measurements  at  a  multisensor  antenna, 
provides  a  better  channel  characterization  for  subsequent 
processing.  These  parameters  include,  among  others,  the 
Direction  Of  Arrival  (DOA)  and  Time  Difference  Of  Arrival 
(TDOA)  of  each  path.  In  MultiCarrier  Modulation  (MCM) 
systems  such  as  Digital  Terrestrial  Television  Broadcast¬ 
ing  (DTTB)  and  Digital  Audio  Boadcasting  (DAB),  the 
transmitted  signals  are  subject  to  the  effects  of  a  multipath 
channel,  in  the  same  way  as  are  Single  Carrier  Modulation 
(SCM)  systems. 

Herein,  we  investigate  the  possibility  of  performing  closed- 
form  Joint  Angle  and  Delay  Estimation  (JADE)  for  a  MCM 
system  in  a  single  batch,  in  a  way  similar  to  JADE  for  SCM 
systems,  by  exploiting  the  frequency  diversity  of  the  sys¬ 
tem,  together  with  a  known  array  geometry.  The  system 
consists  of  a  single  source  and  a  single  antenna  array.  A 
channel  model  is  derived  to  outline  the  frequency  shift  in¬ 
variance  associated  with  the  system.  The  model  exploits 
the  stationarity  of  the  parameters  over  the  coherence  time 
of  the  channel.  It  also  takes  into  account  the  fact  that  the 
unknown  complex  fadings  differ  from  one  carrier  to  another. 
Both  the  uniform  carrier  spacing  and  a  known  array  geome¬ 
try  allow  closed-form  estimation  of  the  propagation  param¬ 
eters.  More  particularly,  if  the  antenna  is  Uniform  Linear 
(ULA),  or  has  an  ESPRIT  doublet  structure,  JADE  can  be 
achieved  using  a  2D  ESPRIT-like  technique.  The  Cramer- 
Rao  Boimd  on  the  variance  of  the  estimated  parameters  is 
also  derived  from  the  obtained  model. 


2.  DATA  MODEL 

The  principle  of  a  Multicarrier  transmitter  is  depicted  in 
Figure  1.  The  concept  is  to  transform  serial  data  into  par¬ 
allel  lower  rate  inputs  that  are  modulated  by  orthogonal 
carriers.  History  and  applications  of  MCM  are  reported  in 
[1].[2]  and  the  references  therein  and  are  not  stated  here 
for  conciseness  purposes.  Assuming  a  single  MCM  source 


Figure  1:  Block  diagram  of  the  MCM  transmitter. 

emitting  over  C  carriers,  the  lowpass  equivalent  transmitted 
signal  is  given  by 

C  oo 

*(*)  =  £  E  Sc  [k]g(t-kT)e2^**  (1) 

c=  1  k=—  oo 

where 

•  sc  [fc]  is  the  k-th  symbol  conveyed  by  carrier  c, 

.  {sc[k]},c=l,...,C  are  independent  from  one  carrier 
to  another  and  identically  distributed, 

•  g(t)  is  the  pulse-shape  function, 

•  T  is  the  symbol  duration,  and 
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with 


•  the  frequency  spacing  between  two  successive  carriers 


In  the  following,  the  channel  is  fading  and  time  varying. 
However,  it  is  regarded  to  be  stationary  within  its  coherence 
time.  Assuming  C  carriers  and  perfect  carrier  phase  and 
sampling  time  recovery,  the  complex  envelope  of  the  lowpass 
received  signal  at  an  M -element  antenna  array  at  time  t  can 
be  written  as 

c 

y  (0  =  + 

C—  1 

C  oo 

=  Yj  Sc  [k]hc(t-kT)e]2n*t  +z{t)  (2) 

c=  1  kss—oo 

where  hc(t)  =  [  hCli(t)  hCt2(t)  ...  hc<M{t)  ]T  is  the 
transmission  channel  associated  with  the  c-th  carrier,  sc  [/:] 
is  the  /c-th  symbol  of  duration  T  conveyed  by  carrier  c  and 
z(f)  is  the  additive  white  Gaussian  noise.  The  coherence 
time  of  the  channel  is  assumed  to  range  over  K  symbol 
periods.  The  channel  hc(t)  can  be  modeled  as  [3] 

Q 

h c(t)  =  a (9q)pc(q)g(t  -  Tq)e~32*cT  (3) 

q=l 

where  Q  is  the  number  of  paths,  9q  and  Tq  are  the  g-th  an¬ 
gle  of  arrived  and  time  delay  respectively  and  /3c(g)  is  the 
complex  attenuation,  which  is  varying  from  carrier  to  car¬ 
rier.  a(0,)  is  the  (Mxl)  vector  of  the  array  response  to 
the  g-th  path,  with  g  =  1,  ...,Q  and  g(t)  is  the  finite  sup¬ 
port  modulation  pulse-shape  function.  We  assume  that  the 
array  outputs  are  received  in  parallel  over  each  carrier  af¬ 
ter  demodulation.  The  channel  length  is  LT.  We  collect 
K  data  samples  on  each  carrier.  Using  some  trivial  manip¬ 
ulations,  this  can  be  expressed  in  a  (M  x  A')-dimensional 
matrix  form  as 

YC  =  HCSC  c  =  1,  ...,C  (4) 

If  the  Toeplitz  matrix  of  data  symbols  Sc,  c  =  1,  ...,C, 
is  known  from  training  and  K  >  M,  an  estimate  of  the 
channel  samples  matrix  Hc  in  (3)  can  be  obtained  for  c  = 
1, ...,  C,  using  least  squares.  Blind  estimation  of  the  channel 
samples  [4,  5]  is  also  possible  in  case  Sc  is  not  known  in 
advance.  The  estimated  channel  can  be  given  as 

He  =  He  +  Nc  (5) 

where  Nc  is  the  estimation  noise  matrix. 

Omitting  the  estimation  noise,  one  can  easily  show  that 
for  each  carrier,  the  terms  e~j27TC'$' ,  q  =  1,  •  •  •  ,Q  in  equa¬ 
tion  (5)  can  be  factored  out,  resulting  in 

Hc  :=  Acdiag  [ec(r)]  G  c  =  l,...,C  (6) 

where  the  ( i ,  j)-th  element  of  G  is  defined  as 

G i,j  =  9  (O'  “  l)?1  ~  r<)  ,  i  =  1,  •  •  • ,  Q  and  j  =  1,  •  •  • ,  L 
and 

Ac(0)  =  [ft^a^,)  0a(c)a(tf2)  ...  /3q(c) a(0Q)]  (7) 


6=16,  02  ...  6q  }T  (8) 

and 

r  iT 

T  =  [  Tl  T2  ...  TQ  \  (9) 

If  we  stack  all  the  matrices  Hc  corresponding  to  all  the 
C  carriers,  we  will  obtain  a  large  (A/C  x  L)-dimensional 
matrix  %  whose  structure  is  given  by 

'  H!  ■ 

H2 

n  = 

.  He  . 

:=  U(0,  t)G  (10) 

where 

Ai(0)diag[ei(r)] 

A2(0)diag[e2(T)j 

U(0,r)=  .  (11) 

.  Ac(0)diag[ec(r)]  _ 

Finally,  we  include  the  channel  estimation  noise  matrix 
Af,  which  is  appropriately  defined  in  accordance  with  (5) 
and  (10).  Therefore,  the  model  in  (10)  becomes 

H  =  U(0,  r)G  +  Af  (12) 

If  we  consider  that  the  delay  spread  of  the  channel  is 
Tm  (expressed  in  terms  of  the  symbol  period  T),  then  the 
coherence  bandwidth  of  the  channel  is  roughly  the  inverse 
of  Tm,  i.e., 


The  frequency  separation  between  carriers  in  the  MultiCar- 
rier  system  is  given  by  A /  =  j;.  All  the  carriers  that  lie 
within  a  frequency  interval  equal  to  the  channel  coherence 
bandwidth  can  be  seen  as  identically  attenuated.  There¬ 
fore,  it  is  reasonable  to  assume  that  the  number  of  carriers 
being  attenuated  equally  is  n  =  =  [^J,  where  [-J 

denotes  the  integer  part.  Under  this  condition,  the  number 
of  ij- carrier  sets  that  share  the  same  attenuation  coefficients 
is  obviously  m  =  nC. 

If  we  consider  only  the  first  mp  carriers  in  the  deriva¬ 
tion  of  the  MC-JADE  model  (10)  (mp  is  at  most  equal  to 
C),  we  will  obtain  a  reduced  MC-JADE  model  ( Mmg  x  L) 
satisfying  the  following  factorization 

—  Um^(0,  t)G  -f-  Afvnn 

-  JFi(t)  o  Ai  (0)  ‘ 

/2(t)oA2(0) 

:=  .  G  +  A Up  (13) 

-7, m (t)  0  Am(0)  _ 
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where 


and 


Aj (9)  =  [  /?i(i)a(0i)  /?2(*')a(^ 2)  •••  /?Q(t')a(0g)  ] 

01  02  •  •  •  0q  1 

01+1  0J+1  ...  0‘g+1 

=  .  .  .  . 

^-i  ^-i . 

and 

0,  =  (14) 

o  denotes  Khatri-Rao  product,  i.e.,  columnwise  Kronecker 
Product. 

3.  THE  ALGORITHM:  MC-JADE-ESPRIT 

If  the  array  is  Uniform  Linear  (ULA)  or  has  an  ESPRIT 
doublet  structure,  then  the  angles  and  delays  can  be  esti¬ 
mated  jointly  in  closed-form  using  an  ESPRIT-like  method. 
For  the  ULA  geometry,  the  steering  vector  a(#?)  will  be 
given  by 

a(*,)  =  [l  0,  ...  ]T  (15) 

where 

0,  =  6^  .me,  (16) 

and  A  is  the  array  sensor  spacing  in  wavelenghts. 

With  the  parameter  definitions  (16)  and  (14),  it  is  more 
appropriate  to  rewrite  (13)  as 

nmn  =  U(iPA)G  +  Armii  (17) 

where 

0  =  [  01  02  ...  0?  ]T  (18) 

and 

0  =  [  01  02  •••  0,  ]T  (19) 

Estimation  of  the  channel  subspace  and  its  dimension 
is  equivalent  to  finding  a  basis  E  of  the  column  span  of  the 
data  matrix  limit  and  estimating  of  the  parameters  0  and  0 
reduces  to  jointly  diagonalize  the  matrices  and 
where 

{  /;  =  jve  <20> 

and 

{ <2i> 

where 

f  J ip  =  I m/i  ®  Im— 1  0(M— 1,1)  (  (22) 

(  J  tp  =  I  m/i  ®  0(M_1,1)  IjVf— 1 


{J^  =  Im®  Im(m-I)  0(Af  (^1  — 1)),  Af) 

are  the  appropriate  selection  matrices  (see  [6], [7]  and  [8]  for 
details  of  JADE),  ®  denotes  Kronecker  product,  I,  is  an  i- 
dimensional  identity  matrix  and  0,-,^  is  a  (i  x  j)-dimensional 
matrix  of  zero  elements. 

Details  of  the  joint  diagonalization  are  provided  in  [7] 
and  the  references  therein.  The  correct  pairing  between  the 
0’s  and  the  0’s  is  guaranteed  by  the  fact  that  matrices  share 
common  eigenvectors. 

If  the  pulse-shape  function  is  assumed  to  be  known,  the 
complex  attenuation  coefficients  can  be  linearly  estimated 
using  least-squares,  by  processing  the  channel  samples  over 
each  carrier  separately. 

4.  IDENTIFIABILITY 

The  parameter  identifiability  requires  to  have  the  (M  mpi  x 
jL)-dimensional  data  matrices  Hi  of  rank  Q,  with  Q  <  Mm  pi 
and  Q  <  L.  This  means  that  U/(&,  t)  must  have  strictly 
more  rows  than  columns  and  be  of  full  column  rank,  and  G 
must  have  more  columns  than  rows  and  be  of  full  row  rank. 
The  full  rank  condition  on  G  together  with  the  channel  fac¬ 
torization  (6)  imply  that  all  the  delays  must  be  distinct.  If 
two  paths  have  the  same  TDOA’s,  the  rank  of  %  becomes 
Q  —  1  and  the  corresponding  angles  cannot  be  identified 
correctly.  In  this  case,  "spatial  smoothing"  [7]  can  pro¬ 
vide  the  solution  [6], [7]  by  performing  data  extension  of  the 
channel  over  each  carrier  in  such  a  way  to  keep  rank  Hi 
equal  to  the  number  of  paths  Q.  In  order  to  allow  selec¬ 
tion  of  the  received  data  (13),  there  must  be  at  least  a  pair 
of  sensors,  i.e.,  M  >  2,  and  the  coherence  bandwidth  to 
carrier  frequency-spacing  ratio  must  be  at  least  2  :  1,  i.e., 
>  2  or  /(  >  2.  The  last  requirement  can  be  satisfied 
by  appropriately  increasing  the  number  of  carriers. 

5.  SIMULATIONS 

The  following  simulation  results  illustrate  performance  of 
MC-JADE-ESPRIT.  In  all  the  experiments,  the  estimation 
Mean  Square  Error  (MSE)  is  averaged  over  500  Monte  Carlo 
runs  of  the  algorithm  and  compared  against  the  Cramer- 
Rao  Bound  (CRB)  which  is  derived  for  the  model  (13)  in  the 
Appendix.  In  the  figures  corresponding  to  the  experiments, 
the  MSE  is  plotted  using  a  full  line  whereas  the  CRB  is 
shown  by  a  dotted  line. 

5.1.  Basic  performance  of  MC-JADE-ESPRIT 

We  consider  an  antenna  of  M  =  2  elements,  spaced  at  half 
wavelength.  The  number  of  paths  is  Q  =  3  with  parameters 
6  =  [  -15°  0°  25°  ]T,  r  =  [  0  0.078  0.234  ]T  T 

and  the  path  fadings  being  generated  from  a  complex  zero- 
mean  Gaussian  distribution  with  variance  [0.4  0.3  0.3]. 

The  channel  lenght  is  half  the  symbol  period  T,  which  is 
normalized  to  T  =  1.  The  pulse-shape  function  is  a  raised 
cosine  with  0.25  roll-off  factor.  C  —  64,  with  pi  =  8.  The 
employed  joint  diagonalization  method  is  method  ”Q”  as 
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MSE  of  angles  (db)  MSE  of  angles  (db) 


it  is  referred  to  in  [7].  Fig.  2  shows  the  effect  of  the  noise 
power  on  the  MSE  of  the  estimated  DOA’s  and  TDOA’s. 
At  high  noise  powers,  the  estimation  is  strongly  sensitive  to 
the  channel  estimation  noise  and  is  erronous.  As  the  noise 
effect  decreases,  the  difference  with  the  CRB  is  about  2  to 
3  dB. 

5.2.  Comparison  with  SI-JADE 

For  the  same  setting,  we  plot  the  CRB  relative  to  the  pa¬ 
rameter  estimation  over  the  first  carrier,  using  SI-JADE  [7], 
against  the  noise  power.  The  stacking  parameter  as  defined 
in  [7]  is  taken  to  be  ml  =  5.  The  CRB  of  SI-JADE  is  plot¬ 
ted  in  Fig.  2  using  a  dashed-line.  Here,  for  low  estimation 
noise  powers,  the  parameter  MSE  of  MC-JADE-ESPRIT 
is  smaller  than  the  CRB  of  SI-JADE.  The  greater  estima¬ 
tion  precision  for  MC-JADE-ESPRIT  is  mainly  due  to  the 
larger  amount  of  information  involved. 


Angle  estimation  Delay  estimation 


1 /noise  (db)  1 /noise  (db) 
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Delay  spacing  (T)  Delay  spacing  (T) 

Figure  4:  Temporal  resolution  of  MC-JADE-ESPRIT. 

on  Fig.  4.  It  is  clear  that  for  small  delay  spacing,  ambigu¬ 
ity  occurs  and  the  full  rank  condition  on  the  pulse-shape 
function  matrix  is  no  more  satisfied,  yielding  an  erroneous 
estimation.  Here,  no  spatial  smoothing  is  applied.  For  well 
separated  delays,  estimation  is  seen  to  depend  only  on  the 
noise  power. 

6.  CONCLUSION 

Advantage  of  the  algorithm  is  that  it  takes  into  account 
the  available  frequency  diversity  provided  by  the  multiple 
carriers  and  processes  data  in  a  single  batch.  However,  es¬ 
timation  of  the  channel  impulse  response  is  prerequisite  to 
the  application  of  the  algorithm,  which  makes  its  perfor¬ 
mance  suboptimal  and  sensitive  to  the  estmation  noise. 


Figure  2:  Basic  performance  of  MC-JADE-ESPRIT. 
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Figure  3:  Spatial  resolution  of  MC-JADE-ESPRIT. 


5.3.  Resolution  of  the  Algorithm 

We  set  the  number  of  paths  to  Q  =  2,  with  the  estimation 
noise  power  being  fixed  at  -20  dB.  All  the  other  parameters 
are  kept  the  same.  In  Fig.  3,  as  it  is  expected,  it  is  shown 
that  estimation  accuracy  improves  with  well  separated  an¬ 
gles,  else  estimation  is  dependent  on  noise  power.  The  effect 
of  delay  spacing  on  the  angle  and  delay  estimation  is  shown 


Appendix 

The  Cramer  Rao  Bound 

The  CRB  for  the  joint  problem  (13)  can  be  derived  as  fol¬ 
lows: 

Let  us  define  the  parameter  vector  as 

Wu  gT(l)  ...  g T(L)  r?) 

where 

n  :=  [5>?{/3r(l)}  3{/3T(l)}...3*{/3T(m)}  3{/3T(m)}  6T  rT]T 

and  5R {.}  and  T {.}  denote  the  real  and  imaginary  parts 
respectively.  In  our  case,  vectors  g (*),*  =  1  ,...,£,  which 
are  the  columns  of  matrix  G  in  (13),  are  deterministic  but 
unknown.  The  data  are  the  channel  estimates  These 

data  are  corrupted  by  the  estimation  noise 

A fmn-=  [  n(l)  n(2)  ...  n(L)  ] 

where  n(i),i  =  1  are  complex,  stationary,  zero-mean 

Gaussian  random  processes  that  are  temporally  uncorre¬ 
lated.  It  follows  that  the  data  'Hmn  are  also  uncorrelated 
Gaussian  random  processes.  The  likelihood  function  of  the 
data  is 

=  \  Mmjil,  x 

(2ir)Mm'*t  (-f) 
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x  expj-;^X^  "*(*>(*')}  (24) 

and  the  corresponding  loglikelihood  function  is 


Finally,  the  CRB  matrix  for  the  parameters  of  interest, 
CRB(0,t),  is  the  2Q-dimensional  bottom-right-comer  par¬ 
tition  matrix  of  CRB{rj)  and  the  bounds  are  found  by  tak¬ 
ing  the  diagonal  elements. 


A  =  In  C  =  const  —  MmpL  In  a 


n*(t)n(j)  (25) 


where  *  denotes  complex  conjugate  transpose.  The  deriva¬ 
tives  of  the  loglikelihood  function  A  with  respect  to  the  un¬ 
known  parameters  can  be  obtained  using  results  of  [9], [6], [10], 
as 


dA 

d{<) 

dA 

5(g(0) 

dA 

dr) 


MmpL  1  v~'  ....  ... 

-2— +  -r2^n  Wn(0 

=  -i-R[U*n(0] 

aH 

=  ^X>{<?(0D*n(«)} 


[D^  De  DT]  (Mmp  x  2 (m  +  1  )Q) 

[D»{/9(1)>  D0{/3(1)>  •••  D3?{/3(m)}  Da{/9(m)}] 

au  eu  1 

a<»{/3(.) i }  •••  J 

au  au  1 

SS{/3(i)ll  ao{/3(.),}  J 

r  au  atr  i 

L  a«i  ae,  J 

r  au  au  i 

L  St,  Or,  J 

l2(m+l)  0g(») 


with  U  =  U(0,  r),  and 

D 
Da 

DRW)} 

D0{/3(.)} 

De 
Dr 

0(0 

Using  results  of  [9], [6],  we  get 


M  mpL 

5R[U*U]  SU 

aH 

4-»[u*d0(O] 

L 

J-^»[5*(.)D'De(i)] 
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The  Fisher  Information  Matrix  (FIM)  for  the  parameters  is 
given  by  E(u>ojt),  where  u ;  :=  [tr^f  gr(l)  .  • .  gT(L)  r)T] 
and  the  inverse  of  the  CRB  matrix  for  the  parameters,  after 
some  manipulations,  is  given  by 


CRB~'(r j) 
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ABSTRACT 


2.  PROBLEM  STATEMENT 


This  paper  deals  with  the  analysis  of  modulated  signals 
in  a  NDA  (Non  Data  Aided)  context.  Assuming  the  de¬ 
tection  of  an  OFDM  signal,  our  goal  is  to  estimate  the 
bandwidth  and  the  number  of  sub-carriers  of  this  signal. 
First,  we  propose  an  algorithm  based  on  wavelet  decom¬ 
position  in  order  to  estimate  the  bandwidth:  bandwidth 
is  correctly  estimated  in  100  %  of  the  cases  with  an  error 
lower  than  8  %  until  SNR  —  3  dB.  Second,  we  apply  the 
MUSIC  algorithm  with  decision  criterion  to  obtain  the 
number  of  sub-carriers:  the  number  of  carriers  can  be 
estimated  with  an  error  lower  or  equal  to  9  %  in  100  % 
of  the  cases  until  SNR  =  10  dB. 


1.  INTRODUCTION 


Spectrum  survey  requires  the  estimation  of  the  param¬ 
eters  of  the  received  signals.  This  problem  has  already 
been  studied  in  the  case  of  the  single-carrier  modulations, 
and  has  now  to  cope  with  new  modulations  types  like 
OFDM  (Orthogonally  Frequency  Division  Multiplexing) 
which  are  more  and  more  used  (DAB,  ADSL,...).  In  [2], 
we  proposed  a  method  to  detect  OFDM  signals  versus 
linear  single-carrier  modulated  signals.  The  problem  we 
now  want  to  solve  is  the  estimation  of  two  main  param¬ 
eters  of  such  a  signal:  the  bandwidth  and  the  number  of 
sub-carriers.  Using  the  fact  that  the  power  spectral  den¬ 
sity  (PSD)  of  an  OFDM  signal  has  a  rectangular  shape, 
we  propose  to  apply  a  wavelet  decomposition  to  detect 
the  breaking  points  at  the  beginning  and  at  the  end. 
Then,  we  try  to  determine  the  number  of  sub-carriers. 
Since  this  number  is  unknown,  AR  modelization  is  im¬ 
possible.  Therefore,  the  MUSIC  algorithm  with  decision 
criterion  seems  to  be  well  suited  to  solve  this  problem. 
In  section  2  we  give  the  problem  statement.  Section  3  is 
dedicated  to  the  bandwidth  estimation  of  OFDM  signals, 
with  performances.  In  section  4  we  give  a  method  to  ob¬ 
tain  the  number  of  sub-carriers.  Section  5  concludes  the 
paper. 


OFDM  is  a  single  carrier  multiplexing,  and  can  then  be 
expressed  as  a  sum  of  single  carrier  modulated  signals: 


xm  (l) 


e2i*(/0+n-A f)t  Tt) 


(1) 

where  {en  *}  is  the  symbol  sequence  which  is  assumed  to 
be  centered,  i.i.d.,  Np  the  number  of  sub-carriers,  A/  the 
frequency  offset  between  carriers,  g(t)  the  pulse  function 
and  P  the  power  of  the  signal.  T,  Tu  +Tg,  Tu  is  the 
“useful  time”  when  information  is  sent,  Tg  is  the  interval 
guard  and  Ts  the  time  of  the  complete  OFDM  symbol. 
We  will  suppose  here  that  the  interval  guard  is  empty. 
Due  to  the  multiplexing  of  many  single  carrier  signals, 
the  spectrum  of  the  OFDM  signal  is  quite  rectangular 
(Fig.  1).  We  assume  to  receive  the  complex  signal  r(t) 
x(t)  +  b[t) where  #(<)  is  the  OFDM  baseband  signal  (with 
possible  frequency  and  time  offsets)  and  b(t)  is  a  complex 
white  gaussian  noise. 


mm  a  un  «cr*  0 w  M  .au.-tK.i-u— 


Figure  1:  Spectrum  amplitude  of  OFDM  signal  with  32 
carriers. 


3.  BANDWIDTH  ESTIMATION 


3.1.  Continous  wavelet  decomposition  (CWT) 


From  a  signal  point  of  view,  wavelets  consist  of  a  linear 
decomposition  of  a  signal  on  a  given  waveform  translated 
in  time  and  dilated  or  compressed  in  time  [1],  In  the  fre¬ 
quency  domain,  wavelet  analysis  is  closely  related  to  fil- 
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tering  the  data  through  a  bank  of  filters  having  constant 
surtension  coefficients.  The  continuous  wavelet  trans¬ 
form  (CVVT)  maps  a  one-dimensional  analog  signal  called 
s(t)  to  a  set  of  wavelet  coefficients  which  vary  continu¬ 
ously  over  time  b  and  scale  a: 

-f  CO 

W(a,b)  a-1/2  •  J  ip*(- — -)-s(t)-dt 

—  00 

where  W(a,b)  signifies  “Wavelet  Transform”.  ip(t)  is 
the  wavelet  used  in  the  decomposition.  Equivalently  the 
CWT  can  be  expressed  as: 

+  00 

W{a,  b )  a1'2  ■  J  4>*{au)  ■  S(v)  ■  e2i*l/b  ■  dv 

—  00 

with  ip(v)  and  S(v)  the  Fourier  transforms  of  ip(t)  and 
s(t)  respectively.  Wavelets  must  satisfy  some  restric¬ 
tions  [1],  the  most  important  ones  are  integrability  and 
square  integrability.  Consequently,  this  condition  implies 
that  if  4>{v)  is  a  smooth  function  in  the  neighborhood  of 
the  frequency  origin  then  V’(O)  0,  which  means  that 

ip(t)  has  no  DC  component.  Other  assumptions  about 
wavelets  can  be  made  for  convenience.  One  such  require¬ 
ment  is  that  4>{v)  0  ,  for  v  <  0.  It  is  also  convenient 

to  assume  that  ip(v)  is  real  for  v  >  0.  The  wavelet  func¬ 
tions  ip(^-)  are  used  to  band-pass  filter  the  signal.  This 
can  be  seen  as  a  kind  of  time-varying  spectral  analysis 
in  which  scale  a  plays  the  role  of  a  local  frequency.  As 
a  increases,  wavelets  are  stretched  and  analyze  low  fre¬ 
quencies,  while  for  small  a,  contracted  wavelets  analyze 
high  frequencies.  The  parameter  b  varying  in  time  con¬ 
trols  the  desired  temporal  location.  The  scalar  product 
corresponds  to  the  signal  measurement  s(t)  in  the  space 
drawn  by  all  the  dilated  or  contracted  figures  of  unique 
function  ip.  In  order  to  analyze,  the  dilation  parameter 
a  is  given  an  initial  large  value  (e.g.  1.0)  and  is  then 
decreased  in  regular  increments  to  examine  the  signal  in 
more  detail.  We  can  write  equivalently  that  the  wavelet 
filter  function  considers  successively  narrow  section  of  the 
signal  spectrum  S(v).  Since  spectral  properties  are  fre¬ 
quently  better  displayed  on  a  logarithmic  frequency  scale, 
it  is  convenient  to  write  a  2~u.  With  this  notation  in¬ 
tegral  increments  in  u  result  in  octave  increments  of  a. 
Note  that  a  small  a  (i.e.  large  u )  corresponds  to  high 
frequencies.  A  small  u  corresponds  to  an  analysis  of  the 
large  scale  features  of  s(t),  and  as  u  increases,  finer  de¬ 
tails  of  the  signal  come  into  focus.  The  function  ip(t) 
is  the  basic  unshifted  and  undilated  wavelet.  It  may  be 
chosen  to  answer  the  needs  [5].  For  example,  in  our  case, 

ip(t)  e~  2  +Jm(  is  the  Morlet  wavelet.  An  important 
property  of  this  basic  wavelet  is  that  it  is  concentrated 
in  the  time  and  frequency  domains.  This  means  that  the 
time-bandwidth  product  is  as  small  as  possible.  To  sat¬ 
isfy  ^(0)  0,  one  must  add  a  correction  term,  but  if 

m  >  5,  this  correction  term  is  negligibly  small  and  can 
be  omitted.  One  problem  of  practical  interests  for  en¬ 
gineers  is  detection  of  abnormal  features.  Generally,  we 
have  to  use  a  discretization  procedure  since  we  consider 


digital  data.  This  discretization  procedure  consists  in  a 
high  resolution  digitalization  of  the  generating  wavelet  in 
the  time  domain,  truncated  on  its  sides  in  order  to  have 
a  finite  extent.  Then,  the  wavelet  coefficients  Cj of  the 
time-frequency  decomposition  are  obtained  by  a  corre¬ 
lation  in  the  time-domain  of  the  interpolated  digitized 
wavelets  ipj^  with  the  discrete  signal  s(n)  for  different 
values  of  the  dilation  factor  2J  and  of  the  time  shift  k. 
This  approach  presents  some  drawbacks  such  as  the  edge 
effects  due  to  the  correlation  of  a  finite  duration  signal 
with  a  truncated  infinite  wavelet,  the  numerical  approx¬ 
imations  due  to  truncature,... 

3.2.  Bandwidth  estimation  method 

The  beginning  and  the  end  of  the  PSD  of  an  OFDM 
signal,  called  R{f),  are  breaking  points  and  can  be  eas¬ 
ily  detected  by  using  a  wavelet  decomposition  [4].  We 
decide  to  choose  the  Morlet  wavelet  for  analyzing  the 
PSD  signal  and  obtain  the  scalogram  figure  of  the  PSD 
(Fig.  2).  Nevertheless,  we  have  to  admit  that  the  esti- 


Figure  2:  Scalogram  of  the  PSD  of  the  received  signal 
r(t).  1024  samples.  SNR  =  3  dB. 

mation  is  purely  visual.  For  that  reason,  we  decide  to 
project  the  resulted  scalogram  to  obtain  its  frequency 
marginal.  Because  wavelet  analysis  is  a  constant  A/// 
transformation,  we  have  to  make  the  sum  of  energy  in  a 
cone,  instead  of  summing  energy  of  column  as  in  the  case 
of  a  bilinear  time-frequency  transformation.  Moreover, 
we  can  not  be  sure  that  the  wavelet  has  the  same  energy 
in  each  time-frequency  logon.  Consequently,  we  propose 
to  calculate  the  scalogram  of  the  Dirac  distribution  which 
has  a  cone  shape  and  specifically  characterizes  breaking 
points.  Considering  this  scalogram,  it  becomes  easy  to 
conserve  only  points  with  enough  energy  (i.e  more  energy 
that  a  given  percentage  of  the  total  energy  of  the  signal) 
and  then  to  form  a  mask  of  description.  We  then  obtain 
the  bandwidth  estimation  algorithm: 

1.  Apply  the  Dirac  mask  on  the  scalogram  of  the  stud¬ 
ied  PSD  signal  R(f)  for  each  frequency  localization. 

2.  Calculate  the  sum  of  the  energy,  which  gives  the  fre¬ 
quency  marginal  of  the  scalogram  . 

3.  Search  for  the  two  extrema  located  in  the  beginning 
and  the  end  of  the  bandwidth. 
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Two  options  are  possible  to  calculate  the  energy  of  the 
scalogram  of  Rif)  in  the  cone  of  the  Dirac  mask.  First, 
we  can  use  a  binary  mask,  which  means  that  energy  is 
equal  to  “1”  if  the  point  belongs  to  the  cone,  “0”  oth¬ 
erwise.  The  second  solution  consists  in  using  a  weighted 
Dirac  mask  which  gives  the  real  energy  of  each  logon  after 
thresholding.  VVe  show  in  Fig.  3  that  the  second  solution 
leads  to  the  right  frequency  marginal. 


Figure  3:  Frequency  marginals  of  the  scalogram  in  the 
case  of  binary  and  weighted  Dirac  mask. 


3.3.  Results  and  performance 
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Figure  4:  Noise  influence:  estimation  performance  for 
different  SNR,  no  time  or  frequency  offsets. 
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We  apply  the  proposed  algorithm  to  10,000  trials  of  sim¬ 
ulated  OFDM  signals.  These  signals  are  generated  with 
4096  samples,  with  4  samples  per  symbol.  The  PSD  is 
evaluated  by  using  1024  points.  We  have  simulated  ex¬ 
actly  the  binary  random  sequence  for  SNR  equal  to  10, 
5,  3  and  0  dB.  Moreover,  we  have  studied  the  effects  of 
bad  synchronization  by  considering  time  and  frequency 
offsets  (time  offset  is  smaller  Tu  and  frequency  offset  can 
not  exceed  5  %  of  the  bandwidth  of  the  signal). 

3.3.1.  Noise  influence 


Figure  5:  frequency  offset  influence:  estimation  perfor¬ 
mance  for  different  SNR,  no  time  offset. 

3.3.3.  Conclusion  concerning  the  method 

The  proposed  method  is  efficient  until  SNR  3  dB, 
even  in  the  case  of  time  or  frequency  offset.  By  using 
the  PSD  of  the  received  signal,  all  phase  perturbations 
can  be  removed.  Until  SNR  3  dB,  we  can  conclude 
that  the  bandwidth  is  correctly  estimated  in  100  %  of  the 
cases  with  an  error  lower  than  8  %. 


Fig.  4  shows  the  results  of  bandwidth  estimation  for  dif¬ 
ferent  SNR.  The  proposed  algorithm  permits  to  deter¬ 
mine  the  bandwidth  with  a  precision  lower  than  4  %  for 
97  %  of  the  signals  when  SNR  3  dB.  But  we  can  ob¬ 
serve  a  strong  degradation  of  the  performance  as  SNR 
goes  to  0  dB. 


4.  ESTIMATION  OF  THE  NUMBER  OF 
SUB-CARRIERS 

4.1.  Theoretical  covariance  matrix 


3.3.2.  Time  and  frequency  offset  influence 

Fig.  5  shows  results  obtained  for  different  SNR  in  the 
case  where  the  frequency  offset  S f0  is  non  zero.  The  new 
scalogram  is  quite  a  translated  version  of  the  original 
scalogram  with  lengh  5fo.  Consequently,  the  bandwidth 
remains  the  same  and  the  performances  are  still  good. 
The  time  offset  8t0  is  equivalent  to  a  new  phase  for  the 
signal.  Since  we  evaluate  its  PSD,  phase  has  not  influence 
anymore  and  then  the  performances  are  strictly  the  same 
as  in  the  case  Sto  0. 


In  this  problem,  we  are  receiving  one  signal  which  is  made 
of  Np  components.  Then,  we  compute  the  coefficients  of 
the  covariance  matrix  called  R.  For  each  time-delay  rn 
in  the  interval  [0  ;  Np  —  1],  the  covariance  term  can  be 
expressed  by: 

1  N* 

r(Tn)  n  -r  '  X!  *(?)  x*(9-rn)  (2) 

p  Tn  ,=rn+ 1 

where  Ne  denotes  the  number  of  samples  of  the  received 
signal.  Moreover  we  can  notice  that  the  estimator  is  a 
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non-biased  estimator.  In  the  case  where  r„  0,  we  have: 


K°) 

P  9  =  * 

5Zk(?)l2  +  ^fc 


9=1 


where  a\  is  the  variance  of  noise.  If  rn  /  0,  we  have: 


r{rn) 


~~  ■  *(«)  ■**(?- r») 


Np  -  Tn 
N, 


9=l+r» 


*2(?) 


-2<7rA/r„ 


<7=1  +  T„ 

and  then  we  consider  un  27rA frn,  depending  on  r„. 


case  of  fading,  the  contributions  of  some  sub-carriers  be¬ 
come  lower  and  the  breaking  point  is  impossible  to  find. 
Another  solution  is  to  use  a  decision  criterion:  Akaike’s 
or  Rissanen’s  criterion.  Akaike’s  criterion  is  more  suited 
since  it  tends  to  overestimate  the  number  of  sources  if 
the  signal  is  oversampled,  which  could  be  helpful  in  the 
case  of  fadings.  Moreover,  this  method  is  efficient  only  if 
there  is  two  noise  contributions  (at  least).  That  is  that 
the  number  of  correlation  terms  must  be  at  least  equal 
to  (Np  +  1).  The  problem  is  that  Np  is  unknown  and  has 
to  be  estimated.  The  proposed  solution  is  to  start  the 
algorithm  with  an  a  priori  number  of  sub-carrier  and  to 
iterate  this  process  until  one  eigenvalue  corresponding  to 
noise  or  a  breaking  point  appears. 


4.2.  Proposed  algorithm 


We  can  then  form  the  covariance  matrix  as: 


R  : 


r(0) 

r*(l) 


r(l) 

r(0) 


^r*(Np  —  1)  r*(JVp-2) 
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Considering  the  value  of  r(rn)  in  the  case  where  rn  0 
or  t„  /  0,  thix  matrice  can  also  be  written  as: 
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This  matrix  is  a  symetrical  matrix  and  its  form  is  the 
same  as  in  the  cases  for  which  MUSIC  algorithm  is  used. 
Then  it  can  be  diagonalized  by  using  eigenvalue  decom¬ 
position  [6].  After  the  diagonalization  process,  we  know 
that  the  autocorrelation  matrix  becomes: 


/Ai  0... 

0  A2 

0  . 

...  Ox 

0  ... 

. . .  \Np  0 

0  ... 

...  0  *1 

0  ... 

0 

\  0  ... 

0  of) 

where  Aj,  A2, . . . ,  \np  are  the  eigenvalues  due  to  the 
contribution  of  the  useful  signal  plus  noise.  Normally, 
At  >  Vi  €  (1,  2, . . . Np}.  We  can  notice  that  the  ma¬ 
trix  contains  Np  eigenvalues  which  are  bigger  than  the 
noise  variance,  and  then  that  the  number  of  sub-carriers 
can  be  deduced. 


Many  solutions  are  possible  to  determine  which  values 
are  due  to  the  contribution  of  the  sub-carriers.  As  the 
channel  has  been  surveyed  before  the  signal  started,  we 
can  assume  that  the  variance  of  noise  has  been  estimated, 
with  of  course  incertitude.  A  second  solution  is  to  rep¬ 
resent  the  eigenvalues  on  a  same  diagram  by  increasing 
value  order  and  to  detect  a  breaking  point.  But,  in  the 


The  first  algorithm  we  propose  is  the  following: 


1.  Fixe  a  priori  the  size  of  the  matrix:  Ne. 

2.  Using  equation  2,  compute  the  Ne  autocorrelation 
terms  and  form  correlation  matrix. 

3.  Diagonalize  the  matrix  and  apply  Akaike’s  critrion. 
If  the  number  of  sub-space  (i.e.  of  sub-carriers)  is 
equal  to  Ne,  go  to  step  1  and  do  Ne  2  •  Ne. 


4.3.  Results 

We  apply  the  proposed  algorithms  to  simulated  OFDM 
signals.  We  simulate  10,000  OFDM  signals  using  10,000 
trials  to  generate  the  corresponding  symbols.  Each  signal 
is  generated  with  50,000  samples  normally  and  contains 
64  sub-carriers.  The  frequency  offset  is  limited  to  10%  of 
the  bandwidth  of  the  signal.  The  channel  is  the  urban 
channel  (COST  207)  in  order  to  compare  decision  criteri- 
ons.  We  apply  MUSIC  algorithm  with  Akaike’s  criterion 
(except  in  figure  8). 


4-3.1.  Noise  influence 


In  the  first  case,  we  are  looking  for  noise  influence.  We 
generate  OFDM  signals  for  different  signal-to-noise  ra¬ 
tios  (20,  10  and  5  dB).  We  can  notice  on  Fig.  6  that  until 
10  dB  performances  are  quite  good,  but  become  poor 
for  5  dB  and  less.  Then,  we  study  the  influence  of  the 
number  of  signal  samples  since  we  use  estimators  of  auto¬ 
correlation  terms.  SNR  is  fixed  to  20  dB,  and  the  signals 
are  tested  with  respectively  50,000,  40,000  and  30,000 
samples.  As  forecasted,  the  performances  decrease  with 
the  number  of  samples.  Nevertheless,  50,000  samples  are 
enough  to  obtain  good  performances  (Fig.  7).  Lastly,  we 
compare  Rissanen’s  and  Akaike’s  criterion  in  the  case  of  a 
signal  with  50,000  samples  and  SNR  =  20  dB  and  10  dB. 
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Figure  6:  Noise  influence  in  the  estimation  of  the  number 
of  sub-carriers. 
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Figure  7:  Influence  of  the  number  of  points  in  the  esti¬ 
mation  of  the  number  of  sub-carriers.  SNR=20  dB. 

Since  it  tends  to  overestimate  the  dimension  of  the  sig¬ 
nal  sub-space,  Akaike’s  criterion  is  quite  better  than  the 
Rissanen’s  one  (Fig.  8). 


4.4.  Conclusion  concerning  the  method. 

This  method  is  quite  efficient  to  estimate  the  number  of 
sub-carriers  until  SNR  =  10  dB  and  for  50,000  samples 
(that  means  about  1500  OFDM  symbols).  Akaike’s  cri¬ 
terion  is  more  appropriated  than  Rissanen’s  one,  but  we 
should  test  the  “Minimum  Description  Length”  criterion. 


5.  CONCLUSION 

The  proposed  methods  to  estimate  the  bandwidth  and 
the  number  of  sub-carriers  are  quite  efficient  for  a  few 
samples  and  low  SNR  (  lower  than  10  dB).  Concerning 
the  bandwidth  estimation,  we  obtain  a  correct  estimation 
in  100  %  of  the  cases  with  an  error  lower  than  8  %  until 
SNR  =  3  dB.  Concerning  the  estimation  of  the  number  of 
sub-carriers,  we  obtain  a  correct  estimation  in  100  %  of 
the  cases  with  an  error  lower  than  9  %  until  SNR  =  10  dB. 
The  performances  can  be  improved  using  denoising  algo¬ 
rithms  [3]  and  compared  with  time-domain  methods  that 
we  are  currently  developping  [4],  This  work  completes 
our  detection  algorithm  and  can  be  used  for  coming  ap- 
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Figure  8:  Influence  of  the  decision  criterion  on  the  estima¬ 
tion  of  the  number  of  sub-carriers.  10,000  trials,  50,000 
samples,  SNR=20  and  10  dB,  urban  channel  (COST 
207). 

plications  of  synchronization  and  equalization. 
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ABSTRACT 

Many  algorithms  for  blind  source  separation  (BSS)  have 
been  introduced  in  the  past  few  years,  most  of  which 
assume  statistically  stationary  sources  as  well  as  instan¬ 
taneous  mixtures  of  signals.  In  many  applications,  such 
as  separation  of  speech  or  fading  communications  sig¬ 
nals,  the  sources  are  nonstationary.  Furthermore,  the 
source  signals  may  undergo  convolutive  (or  dynamic) 
linear  mixing,  and  a  more  complex  BSS  algorithm  is  re¬ 
quired  to  achieve  better  source  separation.  We  present 
a  new  BSS  algorithm  for  separating  linear  convolutive 
mixtures  of  nonstationary  signals  which  relies  on  the 
nonstationary  nature  of  the  sources  to  achieve  sepa¬ 
ration.  The  algorithm  is  an  on-line,  LMS-like  update 
based  on  minimizing  the  average  squared  cross-output- 
channel-correlations  along  with  unity  average  energy 
output  in  each  channel.  We  explain  why,  for  nonsta¬ 
tionary  signals,  such  a  criterion  is  sufficient  to  achieve 
source  separation  regardless  of  the  signal  statistics. 

1.  INTRODUCTION 

The  separation  of  multiple  unknown  sources  from  multi¬ 
sensor  data  has  many  applications,  including  the  iso¬ 
lation  of  individual  speech  signals  from  a  mixture  of 
simultaneous  speakers  (as  in  video  conferencing  or  the 
often-cited  “cocktail  party”  environment),  the  elimi¬ 
nation  of  cross-talk  between  horizontally  and  vertically 
polarized  microwave  communications  transmissions,  and 
the  separation  of  multiple  cellular  telephone  signals  at  a 
base  station.  In  the  past  decade  or  so,  a  number  of  sig¬ 
nificant  methods  have  been  introduced  for  blind  source 
separation,  of  which  we  review  a  few  of  the  most  popu¬ 
lar  here.  One  of  the  earliest  and  most  effective  methods 
(yet  relatively  unknown  in  some  circles)  is  a  constant- 
modulus-based  method  published  in  1985  by  Treichler 
and  Larimore  [1].  This  method  achieves  simultaneous 
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separation  and  equalization  by  minimizing  the  devia¬ 
tion  of  the  separated  output  magnitudes  from  a  fixed 
gain.  This  method  is  very  simple  and  convenient  and 
works  well  even  for  non-constant-modulus  signals  with 
a  sub-Gaussian  kurtosis  (which  includes  most  commu¬ 
nications  signals). 

Jutten  and  Herault  introduced  one  of  the  most  pop¬ 
ular  methods  [2].  This  method  works  well  in  many  ap¬ 
plications,  particularly  cross-talk  situations  in  which  a 
relatively  modest  amount  of  mixing  occurs.  For  more 
challenging  scenarios,  the  existence  of  multiple  min¬ 
ima  and  misconvergence  of  the  widely  used  Jutten- 
Herault  algorithm  has  been  examined  in  the  literature 
[3]-[4] .  Methods  for  non-Gaussian  sources  have  also 
been  developed,  including  [5]  and  others1.  More  re¬ 
cently,  methods  based  on  second-order  statistics  (and 
which  can  thus  work  even  for  Gaussian  sources)  have 
been  introduced.  A  method  by  Belouchrani,  et  al.  can 
separate  stationary  Gaussian  sources  with  different  au¬ 
tocorrelation  statistics  [6]. 

In  many  applications  of  blind  source  separation, 
the  received  signals  are  nonstationary.  Nonstationar- 
ity  may  arise  either  from  the  source  signals  themselves 
(such  as  speech),  or  from  channel  impairments  (such 
as  fading  in  wireless  communications  channels).  Most 
techniques  for  blind  source  separation  assume  station- 
arity  of  the  signals  and  depend  on  reliable  estimation 
of  second-order  or  higher-order  statistics.  These  meth¬ 
ods  may  have  difficulty  when  applied  to  nonstationary 
signals. 

Several  methods  developed  explicitly  for  nonsta¬ 
tionary  source  separation  have  been  published  recently. 
Belouchrani  and  Amin  have  developed  a  time- frequency 
extension  of  the  method  in  [7]  for  nonstationary  sources, 
and  Parra,  et  al.  have  developed  another  method  based 
on  frequency  decomposition  of  several  successive  blocks 
of  time  [8].  While  these  methods  appear  effective,  and 

JIt  should  be  noted  here  that  the  CMA-based  method  by 
Treichler  and  Larimore  also  depends  on  the  sub-Gaussianity  of 
the  sources. 
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the  latter  can  also  separate  convolutive  mixtures,  they 
are  block-based  methods  requiring  somewhat  sophis¬ 
ticated  and  expensive  processing.  Matsuoka,  et  al. 
present  an  on-line,  adaptive  extension  of  the  Jutten- 
Herault  method  which,  somewhat  like  the  method  we 
proposed  in  [9],  attempts  to  minimize  the  average  cross¬ 
correlation  between  separated  channels  while  normal¬ 
izing  the  output  energy  [10]. 

In  various  situations,  convolutive  (or  dynamic)  mix¬ 
ing  occurs  rather  than  instantaneous  mixing.  This  com¬ 
plicates  the  BSS  problem  and  requires  a  more  sophisti¬ 
cated  and  computationally  complex  solution.  Although 
the  convolutive  mixture  problem  is  not  as  widely  pub¬ 
lished  as  the  instantaneous  problem,  methods  for  solv¬ 
ing  the  problem  are  discussed  in  [11]— [12] . 

In  this  paper,  we  extend  our  work  in  [9]  to  con¬ 
volutive  mixtures  to  obtain  a  method  for  blind  source 
separation  of  nonstationary,  convolutively  mixed  sig¬ 
nals  which  requires  only  nonstationarity  and  indepen¬ 
dence  of  the  sources  to  achieve  separation.  An  on-line, 
LMS-like  algorithm  is  derived  which  achieves  separa¬ 
tion  while  normalizing  the  average  energy  of  each  out¬ 
put  channel.  This  simple  algorithm  also  offers  tracking 
capability  for  time-varying  convolutive  mixtures.  The 
optimization  criterion  is  presented  in  the  second  sec¬ 
tion  of  this  paper,  the  adaptive  algorithm  is  derived  in 
the  third  section,  and  simulations  which  illustrate  its 
performance  are  presented  in  the  fourth  section.  Some 
perspectives  on  the  results  are  discussed  in  the  final 
section. 

2.  A  NONSTATIONARY  CONVOLUTIVELY 
MIXED  SOURCE  SEPARATION 
CRITERION 

The  general  source  separation  problem  with  convolu¬ 
tive  mixtures  can  be  described  as 

n 

x(n)  =  ^  A (n  -  m)s(m),  (1) 

m— — oo 

where  s(n)  is  a  vector  of  M  zero-mean,  statistically 
independent  source  processes  at  time-sample  n,  x(n)  is 
a  vector  of  N  sensor  measurements,  N  >  M,  and  A (n) 
is  an  M  x  N  mixing  filter  matrix.  The  goal  of  blind 
source  separation  is  to  determine  an  N  x  M  de-mixing 
matrix  of  filters  B(n)  for  n  =  Q ...  L  —  1 ,  which,  when 
applied  to  the  received  sensor  data  as  in 

L- 1 

y(n) =  51  B(m)x(n  -  m)i  (2) 

m= 0 

recovers  (separates)  the  individual  sources  up  to  an  un¬ 
known  permutation  and  unknown  channel  gains,  which 


cannot  be  uniquely  determined  without  additional  in¬ 
formation  [10]. 

An  important  problem  with  convolutive  mixtures 
is  that  even  complete  separation  may  not  recover  the 
exact  original  x(n)  source  signals.  Due  to  the  blind 
nature  of  the  problem  and  the  memory  introduced  by 
the  convolutive  mixing,  it  may  be  impossible  to  ob¬ 
tain  the  true  source  signals,  and  instead  filtered  ver¬ 
sions  may  result  without  further  assumptions  on  the 
source  signals.  It  is  for  this  reason  that  convolutive- 
mixture-BSS-algorithm  performance  can  be  viewed,  as 
in  [11],  by  how  well  a  system  separates  two  sources 
without  any  regard  to  how  the  output  signals  compare 
to  their  unfiltered  source  versions.  A  way  to  quantify 
this  separation  performance  is  to  see  how  well  (statis¬ 
tically)  uncorrelated  the  output  signals  are.  In  this 
paper  though,  our  methods  perform  joint  separation- 
equalization,  and  this  should  work  well  for  a  certain 
class  of  source  signals.  Our  simulations  compare  the 
output  signals  to  the  original  source  signals  and  quan¬ 
tify  the  performance  in  terms  of  signal-to-interference 
ratio  (SIR). 

It  has  been  observed  in  many  papers  on  blind  source 
separation  that  a  necessary  condition  for  the  separa¬ 
tion  of  zero-mean,  statistically  independent  sources  is 
that  the  cross-correlations  of  the  output  channels  equal 
zero.  However,  this  is  not  a  sufficient  condition,  as  is 
well  known  (see  [9]  for  an  example  demonstrating  this). 
For  sources  with  fixed  variances,  an  ambiguity  exists  as 
there  are  an  infinite  number  of  demixing  matrices  which 
obtain  zero  cross-channel  correlation.  For  any  arbitrary 
pair  of  variances,  the  classes  of  decorrelating  matrices 
are  different  for  different  source  variances,  and  only  a 
true  separating  solution  yields  zero  cross-channel  cor¬ 
relation  for  all  variance  combinations.  This  is  the  key 
insight  on  which  nonstationary  blind  source  separation 
algorithms  are  based.  In  effect,  these  methods  take 
multiple  snapshots  of  the  short-time  cross-correlations 
at  different  times,  and  by  minimizing  all  of  these  si¬ 
multaneously,  they  exploit  the  changes  in  the  relative 
channel  variances  to  find  a  truly  separating  solution. 

This  paper  uses  the  same  basic  insight,  but  proposes 
a  new  criterion  for  exploiting  it  which  leads  to  a  par¬ 
ticularly  simple  and  convenient  algorithm.  We  propose 
to  minimize  the  following  criterion: 

L—l  MM  M 

E  +  *S(*wvi(°)  ~  ^ 

n=0..?L-l  1=0  i=l  i=\  *=1 

L  J 

(3) 

where  at  time  n 

fyiyj{h  n)  =  ^2  h(k)Vi(n  ~k~  l)vAn  ~  k )  (4) 

k 
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and  h(k)  is  a  lowpass  averaging  filter  for  computing 
a  short-term  estimate  of  the  cross-correlation  of  out¬ 
put  channels  j/j  and  yj  at  time  n  and  lag  l.  The  first 
term  in  the  criterion  is  to  minimize  the  average  squared 
magnitude  of  the  short-term  cross-correlations  for  the 
first  L  lags  of  the  output  signals  (which,  as  discussed 
above  and  in  [10],  should  only  be  achieved  for  non¬ 
stationary  signals  by  a  separating  solution),  while  the 
second  term  demands  that  the  output  signals  in  each 
channel  have  unit  energy  on  average.  In  a  sense,  the 
second  criterion  adds  a  signal  normalization  feature  to 
the  algorithm,  but  as  was  shown  in  [1],  this  CM  A  cri¬ 
terion  has  the  ability  to  jointly  separate  and  equalize 
sub-Gaussian  signals.  In  instances  where  the  source 
signals  are  sub-Gaussian  (or  one  or  more  of  them  are), 
the  added  CMA  criterion  greatly  aids  in  separating  as 
well  as  equalizing  in  order  to  obtain  closer  estimates  of 
the  original  x(n)  source  signals. 

3.  ADAPTIVE  ALGORITHM 


M 

i  =  3 

+  2X(fyp,yp(0)  —  l){fyptXq(k))  (7) 

We  now  derive  efficient  recursive  updates  for  the 
short-term  correlation  estimates  for  a  convenient  form 
of  the  averaging  filter.  For  computational  efficiency, 
we  select  a  first-order  HR  averaging  filter  with  impulse 
response 

h(k)  =  aku{k)  (8) 

where  u{k)  is  the  unit  step  function  and  0  <  a  <  1. 
With  this  form,  the  correlation  statistics  can  easily  be 
updated  recursively  according  to 

fyiVj  (/;  n  +  1)  =  afyiVj  (1;  n)  +  yi(n  -  \l\)yj(n),  (9) 

and  similarly 

fyixj(kn  +  1)  =  afyiXj(l\n )  +  yi(n  -  \ l\)xj(n)  (10) 


There  are  many  ways  to  construct  a  numerical  algo¬ 
rithm  based  on  the  above  criterion  for  blind  nonsta¬ 
tionary  source  separation,  yielding  different  tradeoffs 
in  terms  of  computational  efficiency,  convergence  rate, 
block-based  or  adaptive  forms,  etc.  However,  in  many 
applications,  a  simple,  adaptive  method  which  can  track 
slow  variations  in  the  mixing  parameters  is  desired.  We 
derive  here  a  stochastic  gradient  (LMS-like)  algorithm 
which  has  these  characteristics. 

Many  of  the  most  successful  adaptive  algorithms  are 
based  on  a  stochastic  gradient  update  using  an  instan¬ 
taneous  approximation  to  the  expectation  in  the  opti¬ 
mization  criterion.  For  the  optimization  of  the  demix¬ 
ing  matrices,  B(Z)’s,  a  stochastic  gradient  update  takes 
the  form 


Bn+1(0  =  B„(/)-MVn(0  for  l  =  0 . . .  L  —  1.  (5) 


where 


v„(0 


d 


dbpq{l) 


M  M 


EE  ^ ViVi  (0  +  yiViify  1) 


i~  1  j= l 

j^i 


M 

c 

i= 1 


(6) 

where  p  and  q  are  the  row  and  column  indices  of  the 
gradient  matrix.  Note  the  use  of  the  instantaneous 
value  at  time  n  of  the  error  function  in  (3)  in  the  gra¬ 
dient  computation.  The  (p,  g)th  element  of  the  gradient 
matrix  at  lag  l  can  easily  be  shown  to  be 


L—l 


^7pq,n(l)  —  2 

1=0 


M 

'  ^Vv,l li  (0 ?xq,yj  (l  —  k)  + 

j=  i 


for  all  lags  l  which  are  required  for  the  algorithm.  This 
completes  the  following  simple  recursive  algorithm  for 
nonstationary  blind  source  separation. 

1.  Compute  output  according  to  (2). 

2.  Update  short-time  correlations  using  (9)  and  (10). 

3.  Compute  separation  filter  gradient  using  (7). 

4.  Update  separation  filters  as  in  (5). 

5.  Go  back  to  step  1. 


The  complexity  of  the  algorithm  in  the  instanta¬ 
neous  mixture  case  was  shown  in  [9]  to  be  0(M2N). 
Extension  to  the  convolutive  mixture  case  yields  in¬ 
creased  complexity  by  a  factor  of  L2,  where  L  is  of 
course  a  chosen  parameter  which  can  be  used  to  trade 
off  complexity  and  quality  of  separation. 

>  4.  SIMULATIONS 

Several  simulations  have  been  performed  to  confirm  the 
efficacy  of  the  proposed  method.  For  the  following  sim¬ 
ulation  with  two  sources  and  sensors,  the  mixing  ma¬ 
trices  are: 


1  -.5 

.7  1.3 


.35  -.3 

-.2  .6 


-.2 

.15 


(11) 

where  the  first  matrix  represents  zero  lag,  the  second 
represents  a  lag  of  one,  and  the  third  represents  a  lag 
of  two.  The  nonstationary  sources,  shown  in  Figure 
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Figure  1:  First  4000  samples  of  the  nonstationary 
sources  used  in  the  simulation 

1,  are  binary  random  signals  multiplied  by  lowpass  fil¬ 
tered  Gaussian  signals,  and  may  be  considered  a  crude 
approximation  to  communications  signals  undergoing 
fading.  Three  mixing  scenarios  are  simulated  by  con¬ 
sidering  the  cases  of  A  as  above,  only  the  first  two  ma¬ 
trices  of  A,  and  only  the  first  matrix  of  A  (ie.  instan¬ 
taneous  mixture).  These  mixtures  are  tested  against 
our  source  separation  algorithm  with  L  values  ranging 
from  1  to  4,  resulting  in  12  different  simulations. 

Our  BSS  algorithm  was  tested  in  these  12  simula¬ 
tions  and  SIRs  were  computed  for  each  of  these  cases, 
as  well  as  for  the  case  where  no  source  separation  is 
applied2.  When  our  BSS  algorithm  is  applied,  output 
scaling  is  needed  as  BSS  can  only  recover  up  to  an  un¬ 
known  scale  value.  Since  the  scaling  changes  over  time 
as  the  algorithm  adapts,  the  signal  was  normalized  by 
an  approximate  best-fit  scale  factor  every  100  samples. 
A  length-10,000  sample  period  was  evaluated  after  suf¬ 
ficient  convergence  (using  small  values  of  fi)  to  obtain 
the  resulting  SIR  values. 

Table  I  shows  the  simulation  results  when  only  the 
first  matrix  in  A  is  used  for  mixing,  which  results  in 
purely  instantaneous  mixing.  The  results  show  excel¬ 
lent  performance  for  all  cases  of  L,  but  one  feature  is 
that  performance  degrades  slightly  with  increasing  L. 
The  reason  for  this  is  because  only  L  =  1  is  needed  to 
solve  this  problem,  and  by  adding  unneeded,  adaptable 
coefficients,  performance  suffers  slightly  due  to  misad- 
justment  in  the  stochastic  gradient  algorithm  for  the 
non-instantaneous  coefficients. 

Table  II  shows  the  simulation  results  for  length-2 
mixing  (ie.  only  the  first  two  matrices  of  A  are  ap- 

2In  this  case,  the  desired  source  signal  is  chosen  according  to 
which  source  is  dominant  in  the  mixture. 


Table  1:  Length-1  (Instantaneous)  Mixture  Results 


BSS  Type 

SIR  in  dB 

Source  1 

Source  2 

None 

6.8954 

6.5736 

L  =  1 

36.1091 

31.7012 

L  =  2 

33.8167 

32.2873 

L  =  3 

32.0851 

28.7450 

L  =  4 

29.7797 

28.9522 

Table  2:  Length-2  Mixing  Results 


BSS  Type 

SIR  in  dB 

Source  1 

Source  2 

None 

4.8529 

4.6307 

L  =  1 

13.3890 

6.2799 

L  —  2 

22.6628 

11.3558 

L  =  3 

27.6702 

16.3054 

L  —  4 

29.3789 

21.2188 

plied).  The  results  clearly  show  a  performance  degra¬ 
dation  compared  to  the  instantaneous  mixture  results 
as  the  memory  increases  the  difficulty  of  separation.  It 
can  be  seen  that  the  L  =  1  case  does  a  fairly  poor  job 
of  signal  separation,  and  increasing  L  results  in  bet¬ 
ter  SIR  values  as  expected.  Another  observation  is  the 
imbalance  of  SIR  performance  between  the  two  source 
signals.  This  is  a  function  of  the  mixing  filters. 

Table  III  shows  the  simulation  results  for  length- 
3  mixing  using  A  as  in  (11).  The  results  show  even 
further  degradation  than  the  length-2  mixture  case  as 
the  increased  mixing  is  more  difficult  to  recover  from. 
Again,  the  performance  increases  with  the  demixing 
filter  length,  L.  Further  gains  could  be  obtained  by  us¬ 
ing  a  larger  L ,  but  this  comes  at  the  expense  of  greater 
complexity  of  the  system  (proportional  to  L2)  as  well 
as  much  slower  convergence. 


Table  3:  Length-3  Mixing  Results 


BSS  Type 

SIR  in  dB 

Source  1 

Source  2 

None 

4.4030 

4.3170 

L  =  1 

9.2718 

5.7749 

L  =  2 

11.0878 

9.3144 

L  =  3 

13.6907 

12.4754 

L  =  4 

15.8287 

15.0418 
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5.  CONCLUSIONS 

Effective  blind  source  separation  can  be  achieved  by  ex¬ 
ploiting  nonstationarity  of  the  sources.  Furthermore,  it 
is  possible  to  separate  convolutively  mixed  signals  with 
the  algorithm.  This  paper  clearly  shows  performance 
gains  can  be  made  over  an  instantaneous  mixture  algo¬ 
rithm  in  the  presence  of  convolutive  mixtures. 

Nonstationary  blind  source  separation  algorithms 
appear  particularly  relevant  for  practical  applications 
because  many  sources  of  interest,  such  as  speech  or  fad¬ 
ing  signals,  exhibit  nonstationarity  but  may  not  oth¬ 
erwise  present  features  (such  as  non-Gaussian  statis¬ 
tics  or  different  auto-correlation  structure)  required  by 
other  methods. 

In  comparison  with  other  nonstationary  blind  source 
separation  algorithms,  the  method  proposed  here  re¬ 
sults  in  a  simple  on-line  stochastic  gradient  algorithm 
requiring  only  multiplications  and  additions,  which  are 
efficiently  implemented  in  signal  processing  hardware. 
It  appears  to  exhibit  the  traditional  characteristics  of 
LMS-like  algorithms  including  robustness  and  numeri¬ 
cal  stability,  the  ability  to  track  slow  variations  in  the 
environment,  and  relatively  slow  convergence. 

The  computational  complexity  of  the  algorithm  is 
0(NM2L2).  That  is,  the  cost  is  linear  in  the  number 
of  receivers,  but  quadratic  in  the  number  of  sources 
and  the  demixing  filter  lengths.  For  many  applications, 
these  parameters  are  very  small,  and  the  algorithm  is 
very  efficient.  For  larger  values  of  L,  the  computational 
cost  may  be  the  limiting  factor  in  a  tradeoff  between 
performance  and  complexity. 
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ABSTRACT 

For  the  analysis  and  design  of  adaptive  antenna  arrays  in  mobile 
fading  channels,  we  need  a  model  for  the  spatio-temporal 
correlation  among  the  array  elements.  In  this  paper  we  propose  a 
general  spatio-temporal  correlation  function,  where  non-isotropic 
scattering  is  modeled  by  von  Mises  distribution,  an  empirically- 
verified  model  for  non-uniformly  distributed  angle  of  arrival.  The 
proposed  correlation  function  has  a  closed  form  and  is  suitable 
for  both  mathematical  analysis  and  numerical  calculations.  The 
utility  of  the  new  correlation  function  has  been  demonstrated  by 
quantifying  the  effect  of  non-isotropic  scattering  on  the 
performance  of  two  applications  of  the  antenna  arrays  for 
multiuser  multichannel  detection  and  single-user  diversity 
reception.  Comparison  of  the  proposed  correlation  model  with 
published  data  in  the  literature  shows  the  flexibility  of  the  model 
in  fitting  real  data. 

1.  INTRODUCTION 

In  recent  years  the  application  of  adaptive  antenna  arrays  (smart 
antennas)  for  cellular  systems  has  received  much  attention  [I], 
since  they  can  improve  the  coverage,  quality,  and  capacity  of 
such  systems  by  combating  interference,  fading,  and  other 
undesired  disturbances.  An  adaptive  array  can  be  defined  as  an 
adaptive  spatio-temporal  filter,  which  takes  advantage  of  both 
time-domain  and  space-domain  signal  characteristics.  Efficient 
joint  use  of  time-domain  and  space-domain  data  demands  a 
generalization  of  conventional  communication  theory  and  signal 
processing  techniques  to  spatial  and  temporal  communication 
theory  [2]  and  space-time  signal  processing  techniques  [3]. 
Needless  to  say,  new  spatio-temporal  channel  models  have  to  be 
developed  as  well.  Since  the  second-order  statistics  of  the 
channel  characterize  the  basic  structure  of  stochastic  mobile 
channels,  we  need  a  spatio-temporal  correlation  function  to  study 
the  basic  impact  of  the  random  channel  on  the  performance  of 
space-time  solutions,  including  the  adaptive  antenna  arrays. 

In  this  paper  we  present  a  flexible  and  versatile  parametric 
correlation  function  for  the  mobile  station  (MS)  (similar  results 
can  be  obtained  for  the  base  station  (BS)  as  well,  as  we  see  in 
Section  4).  We  do  this  by  generalizing  the  spatio-temporal 
correlation  function  in  [4],  originally  derived  for  an  isotropic 
scattering  scenario  where  the  MS  receives  signals  from  all 
direction  with  equal  probability,  to  the  non-isotropic  scattering 
case.  Note  that  isotropic  scattering  at  the  MS  corresponds  to  the 
uniform  distribution  for  the  angle  of  arrival  (AOA)  at  the  MS. 
However,  empirical  results  have  shown  that  due  to  the  structure 
of  the  mobile  channel,  the  MS  is  likely  to  receive  signals  only 
from  particular  directions  (see  [5]  and  references  therein).  In 


other  words,  most  often  the  MS  experiences  non-isotropic 
scattering,  which  results  in  a  non-uniform  distribution  for  the 
AOA  at  the  MS.  In  [5]  it  has  been  shown  that  the  application  of 
von  Mises  distribution  for  the  AOA  at  the  MS  yields  an  easy-to- 
use  and  closed-form  expression  for  the  temporal  (or  equivalently, 
spatial)  correlation  function.  This  correlation  function  has 
exhibited  very  good  fit  to  measured  data  [5]. 

In  the  sequel  we  derive  a  new  spatio-temporal  correlation 
function  where  non-isotropic  scattering  is  modeled  by  the  von 
Mises  distribution.  To  show  the  significant  effect  of  non¬ 
isotropic  scattering  on  the  performance  of  smart  antenna  systems 
employing  space-time  data,  we  study  the  performance  of  an 
antenna  array  multiuser  detector  equipped  with  a  channel 
estimator,  operating  in  a  Rayleigh  fading  channel.  As  a  simpler 
example  where  only  space  data  are  employed,  we  also  investigate 
the  impact  of  non-isotropic  scattering  on  a  multi-element  receiver 
working  as  a  maximal  ratio  combiner  (MRC)  in  a  Rayleigh 
fading  channel.  In  both  examples  we  show  how  the  proposed 
spatio-temporal  correlation  function  helps  us  in  quantifying  the 
effect  of  the  fading  channel  on  the  performance  of  antenna 
arrays,  in  the  realistic  scenario  of  non-isotropic  scattering.  The 
paper  concludes  with  a  comparison  of  the  proposed  correlation 
model  with  the  published  correlation  data,  collected  by  a  BS- 
mounted  array. 

2.  A  NEW  CORRELATION  FUNCTION 


Consider  a  linear  uniformly-spaced  antenna  array  shown  in  [4, 
Fig.  2],  mounted  on  a  MS.  Let  rm(t)  denote  the  complex 
envelope  at  the  mlh  element  from  left.  Then  the  normalized 
correlation  function  between  the  complex  envelopes  of  the  mth 
and  the  nth  antenna  elements,  defined  by 

(z)  =  E[rm (t)r’(t  +  t)\/E[\  rm ( t )  |2  J ,  can  be  derived  from  [4]: 

(U  =  E[cxp{j2rfd t cos(0  -a)  +  j{m  -  n)2it(d/A) cos  ©J,  (1) 

where  E  denotes  mathematical  expectation,  j  =  V-T ,  fd  is  the 
maximum  Doppler  frequency,  0  stands  for  the  AOA,  a 
represents  the  direction  of  the  motion  of  the  MS  with  respect  to 
the  horizontal  axis  counterclockwise,  d  is  the  spacing  between 
any  two  adjacent  antenna  elements,  and  A  is  the  wavelength. 
Now  we  consider  the  von  Mises  probability  density  function 
(PDF)  for  the  random  variable  0  : 


p0m= 


exp|x'cos(0-0p)) 
2 ttI0(k) 


6  s  {-n,  k)  , 


(2) 


where  /0(.)  is  the  zero-order  modified  Bessel  function, 
br  £  \-n,n)  accounts  for  the  mean  direction  of  AOA,  and 
k  >  0  controls  the  width  of  the  AOA  distribution  [5].  For  k  =0 
(isotropic  scattering)  we  have  pe(6)  =  1/(2 n) ,  while  for  k  =<*> 
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(extremely  non-isotropic  scattering)  we  obtain 
pe(6)  =  bib  -bp) ,  where  <5(.)  is  the  Dirac  delta  function.  By 
calculating  the  expectation  in  (1)  according  to  (2)  we  obtain: 

A.  (*)&.(*)-  (3) 

10^Jk2  - x 2  - y 2  -2xycosor+  j2K[xcos(a-9p)+ ycosdp]  j, 

where  x  =  2rfdr  and  y  =  2n(m - n)d/A  .  With  x  =0 ,  (3) 
reduces  to  Lee’s  spatio-temporal  correlation  function 
J0(ijx2+y2  +  2xycosa)  in  [4,  Eqs.  (42)-(43)]  for  isotropic 
scattering,  where  /„(.)  is  the  zero-order  Bessel  function.  For 
m  =  n  =  1  (single  antenna),  Lee’s  result  further  simplifies  to 
Clarke’s  classic  temporal  correlation  function  J0(x)  [6,  p.  40, 
Eq.  (2.20)].  For  a  single  antenna  experiencing  non-isotropic 
scattering  and  a  =  0 ,  (3)  reduces  to  the  temporal  correlation 
function  /0(y*2  - x2  +  /2k  xcosbp  )//0(x)  derived  in  [5,  Eq. 
(2)]  (this  correlation  function  has  shown  very  good  fit  to 
measured  data  [5]). 

In  comparison  with  the  existing  spatial  correlation  functions  for 
antenna  arrays  [7],  our  proposed  model  in  (3)  has  the  main 
advantage  that  it  includes  both  space  and  time  dimensions  in  a 
single  mathematically-tractable  closed-form  expression,  flexible 
for  fitting  to  array  data,  studying  the  performance  of  various 
array-based  techniques  [8]  for  different  applications  in  fading 
channels  with  the  realistic  assumption  of  non-isotropic 
scattering,  optimizing  array  configurations  [9],  etc.. 

3.  TWO  ARRAY  APPLICATIONS 

In  this  section  we  use  the  proposed  model  in  (3)  for  two  array- 
based  applications.  In  the  first  one  we  need  a  spatio-temporal 
correlation  function,  while  for  the  second  one  a  spatial-only 
correlation  function  is  needed.  In  array  applications,  the  need  for 
a  spatio-temporal  correlation  function  also  appears  in 
conjunction  with  such  important  fading  characteristics  as  level 
crossing  rate  and  average  fade  duration  [4]  [10],  which  due  to 
space  limitations  we  do  not  address  here. 


waves  from  all  directions  with  equal  probability,  while  in  Fig.  2, 
where  x,  =  k2  =  10 ,  the  MS  receives  directional  waves  from  two 
specific  directions  (the  beamwidth  in  each  direction  is  equal  to 
BW  =  2/Vk  =36°  [5]).  Suppose  the  first  user  is  the  desired 
user,  while  the  second  one  is  the  interfering  user.  The  MS  moves 
from  left  to  right  ( a  =  0  )  and  the  users  travel  at  speeds  such  that 
the  desired  user  has  the  maximum  Doppler  frequency  fdJ  =  0.1 
Hz,  while  the  interfering  user  has  the  maximum  Doppler 
frequency  fd,2  =  0.05  Hz.  Assume  the  correlation  coefficient 
between  the  users’  signature  waveforms  is  pl2  =  0.5  ,  and  the 
MS  uses  only  the  past  two  values  (1  =  2)  of  matched  filter 
outputs  and  bit  decisions  for  fading  estimation  and  bit  detection 
in  the  presence  of  Rayleigh  fading  and  zero-mean  additive  white 
Gaussian  noise  with  variance  o 2 .  Suppose  both  users  have 
(equal)  unit  power.  Let  us  define  the  signal-to-noise  ratio  (SNR) 
as  y  =  l/o 2 .  For  d  =  0.3/  and  A ,  the  asymptotic  efficiency  of 
the  desired  users,  rj] ,  calculated  using  the  equations  given  in 
[12],  is  plotted  in  Figs.  3  and  4  versus  SNR,  assuming 
x,  =  k2  =  0  and  x,  =  x2  =  10 .  According  to  both  figures,  as  k 
increases  (more  directional  reception),  the  efficiency  of  both 
detectors  increases  significantly  (which  is  good  news).  However, 
the  difference  between  the  detectors  efficiencies  increase  as  well, 
which  implies  that  choosing  the  decorrelating  detector,  due  to  its 
lower  complexity,  introduces  a  significant  loss  in  efficiency  when 
we  have  non-isotropic  scattering.  Hence,  we  need  to  develop  new 
suboptimum  low-complexity  detectors  with  efficiencies 
comparable  with  the  optimum  detector,  in  channels  with 
directional  reception. 

3.2  Average  Bit  Error  Rate  of  a  Single-User 
Multichannel  Array  Detector 

Assume  that  in  Figs.  1  and  2,  we  have  user  one  only  ( K  =  1 ), 
and  bp=  0 .  Moreover,  both  the  MS  and  the  user  are  stationary 
(fd  =  0 ).  The  user  sends  data  using  binary  phase  shift  keying 
(BPSK)  modulation  scheme,  and  the  MS  is  equipped  with  a  two- 
branch  ( M  =  2 )  maximal  ratio  combiner  (MRC).  The  average 
bit  error  rate  (BER)  in  this  case  is  given  by  [13,  Eq.  (12)]: 


3.1  Efficiency  of  Two  Multiuser  Multichannel 
Array  Detectors 


(l +p) 


Y(}  +  p) 

i  +  y(\  +  p) 


.(4) 


For  code  division  multiple  access  (CDMA)  signals,  recently  two 
array-based  multiuser  detection  schemes  with  imperfect  estimates 
of  the  fading  channel  were  investigated  in  [11]:  the  decision- 
directed  detector  (with  more  complexity)  which  is  optimum,  and 
the  decorrelating  detector  (with  less  complexity)  which  is 
suboptimum.  In  terms  of  the  asymptotic  efficiency,  it  has  been 
proven  that  the  decision-directed  detector  is  superior.  However, 
the  decorrelating  detector  is  simpler  to  implement.  So,  it  is  of 
interest  to  determine  how  much  these  two  detectors  are  different 
in  terms  of  asymptotic  efficiency.  Here,  by  a  simple  example  [12, 
p.  107  and  p.  1 17]  we  show  that  the  answer  strongly  depends  on 
the  mode  of  scattering,  which  affects  the  correlation  function  of 
the  complex  envelope  in  the  fading  channel. 

Assume  that  the  MS  has  a  two-element  antenna  ( M  =  2 ),  and 
there  are  two  mobile  users  (K  =  2)  according  to  the 
configuration  shown  in  Figs.  1  and  2  ( 6p  l  =  0 ,  6p  2  =  n  )■  In  Fig. 
1  we  have  x,  =  x2  =0,  where  the  MS  receives  scattered  plain 


where  p  =|  <pn  (0)  | .  In  Figs.  5  and  6  we  have  plotted  Pb  (y ) 
versus  y  for  d  =  0.3X  and  A  ,  respectively.  As  we  expect,  the 
average  BER  increases  as  x  increases,  because  it  results  in  more 
correlation  between  the  branches.  Of  course,  a  larger  d  can 
reduce  the  amount  of  correlation  between  branches,  resulting  in 
smaller  average  BER  (compare  Figs.  5  and  6). 

4.  COMPARISON  WITH  DATA 

Although  the  application  of  antenna  arrays  in  both  MS  and  BS  is 
advantageous,  in  this  section  we  focus  on  BS  since  the 
application  of  arrays  at  the  BS  is  more  common  (practical 
constraints  usually  restrict  the  use  of  an  array  of  antennas  at  a 
MS).  For  statistical  characterization  of  narrow  histograms  of  the 
AOA  of  waves  impinging  the  BS  [14]  [15]  (which  gives  rise  to 
the  non-uniform  distribution  of  power  versus  the  azimuth  angle 
[16]),  three  different  PDF’s  are  used  so  far  in  the  literature: 
cosine  [17],  Gaussian  [18],  and  truncated  uniform  [19].  All  these 
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PDF’s  are  considered  primarily  for  studying  the  effect  of  non- 
uniformly  distributed  AOA  on  the  spatial  correlation  among  the 
array  elements  at  a  BS.  With  appropriate  choice  of  parameters, 
these  three  PDF’s  can  resemble  visually  the  narrow  histograms  of 
the  AOA  at  the  BS  (although  the  truncated  uniform  PDF  is  less 
likely  to  do  that  because  the  empirical  histograms  are  usually 
bell-shaped  [14]  [15]  and  decay  to  zero  not  as  abruptly  as  a 
truncated  uniform  PDF).  So,  mathematical  convenience  seems  to 
be  the  main  concern  in  choosing  a  PDF  for  the  AOA,  among 
empirically-acceptable  candidates.  From  this  point  of  view,  none 
of  these  three  PDF’s  are  able  to  provide  a  simple  closed-form 
solution  (in  terms  of  known  mathematical  functions)  for  the 
correlation  between  the  complex  envelopes  of  the  array  elements 
(which  is  a  basic  quantity  in  array-related  studies).  For  the 
Gaussian  PDF  only  approximate  results  can  be  found  [18]  [20], 
and  for  the  truncated  uniform  PDF,  closed-form  results  can  be 
derived  only  for  inline  and  broadside  cases  [21]  (the  cosine  PDF 
is  less  likely  to  yield  a  closed-form  answer  because  of  the  special 
integral  that  has  to  be  solved).  On  the  other  hand,  as  we  see  in 
the  sequel,  von  Mises  PDF  yields  a  simple  and  compact 
expression,  given  in  (5),  which  is  basically  the  same  as  (3).  This 
makes  the  von  Mises  PDF  a  very  suitable  model. 

Comparison  of  the  Gaussian  PDF  with  the  histograms  of  AOA 
data  has  shown  reasonable  agreement  [15]  [22],  This  is  a  good 
empirical  support  for  the  von  Mises  PDF  because  for  large  *  , 
the  PDF  in  (2)  resembles  a  small-variance  Gaussian  PDF  with 
mean  bp  and  standard  deviation  1/V*  [23,  p.  60],  In  fact,  for 
any  beamwidth  (angle  spread)  smaller  than  40°  (which 
correspond  to  *>  8.2  according  to  the  definition  of  beamwidth 
as  BW  =  2  Nk  in  [5]),  the  plots  of  Gaussian  and  von  Mises 
PDF  are  indistinguishable  (two  typical  standard  deviations  for 
the  Gaussian  PDF  are  15°  [22]  and  6°  [15],  which  correspond 
to  x  =  14.6  and  *=91.2,  respectively).  However,  recall  that 
von  Mises  PDF  is  able  to  provide  a  general  and  closed-form 
solution  for  the  space-time  correlation  between  the  complex 
envelopes  of  the  array  elements,  while  Gaussian  PDF  cannot. 

Using  exactly  the  same  notation  as  [17],  it  is  straightforward  to 
show  that  for  the  linear  uniformly-spaced  antenna  array  at  the  BS 
in  [17,  Fig.  6]  we  have: 

(5) 

/o^V*-2  _A-2  - y 2  +2xycosy+  j2K[xcos{y-a)-y  cosa]  j, 

provided  that  AOA  has  a  von  Mises  PDF  with  the  mean  direction 
ae  \-n,7i)  and  the  width  control  parameter  k  >0.  All  of  the 
parameters  in  (5)  are  the  same  as  (3),  except  for  j  in  (5)  which 
represents  the  direction  of  the  motion  of  the  MS  with  respect  to 
the  horizontal  axis  counterclockwise,  in  place  of  a  in  (3)  (the  } 
here  should  not  be  confused  with  the  SNR  symbol  }  ,  used  in 
Section  3).  The  two  sign  changes  in  (5),  in  comparison  with  (3), 
come  from  different  ways  of  numbering  the  array  elements:  in  [4, 
Fig.  2],  the  elements  are  numbered  from  left  to  right,  while 
elements  numbering  in  [17,  Fig.  6]  is  from  right  to  left. 

Now  we  compare  our  correlation  model  with  the  data  published 
in  [17],  where  the  data  are  spatial  cross-correlations  between  the 
square  of  the  envelopes  of  a  two  element  array,  mounted  on  a 
BS.  We  do  this  by  considering  two  models  for  the  AOA  PDF  at 
the  BS:  the  simple  model  with 


Pe(b)  =  exp{*  cos(6  -a)}/2n  /0(x) ,  and  the  composite  model 
with  pe(6)  =  £  exp(xcos(6 -a)\/2n  /0(*)  +  (l-£)/2;z  .where 
0  <  £  <  1  indicates  the  amount  of  directional  reception.  The 
composite  PDF  reduces  to  the  von  Mises  PDF  for  £  =  1  ,  and 
simplifies  to  the  uniform  PDF  for  £  =  0.  Consequently,  the 
associated  spatial  correlation  functions  for  a  two  element  array  at 
a  BS  can  be  written  as: 


012  (0)  =  I0^Jk2  -4x2(d  /  A.)2  +  j4x  K{d  /  A)  cos  a  )/W  ,(6) 
012(O)  =  £ I0^Jic2  -4k 2(d / A)2  +  j4x K(d / A)cosa  /l0(K) 


+(X-S)J0(Z»d/X). 


(7) 


Figs.  7-8  show  Lee’s  correlation  data,  plotted  together  with 
1 012  (0)  |2  calculated  according  to  (6)  and  (7)  for  both  models. 
For  a  given  a  (known  a  priori  for  each  data  set),  the  unknown 
k  for  the  simple  model  and  the  unknown  pair  (*,£)  for  the 
composite  model  are  estimated  by  the  nonlinear  least  squares 
method  (implemented  via  a  systematic  numerical  search 
technique).  Based  on  these  figures  (and  many  others  not  shown 
due  to  space  limitations),  the  von  Mises  PDF  is  able  to  account 
for  the  variations  of  the  correlation  versus  antenna  spacing  with 
reasonable  accuracy  (compare  our  correlation  plots  with  those 
drawn  in  [17]  assuming  the  cosine  PDF  and  [21]  using  the 
truncated  uniform  PDF,  both  for  the  same  data  sets.  Interestingly, 
the  correlation  plots  in  [17]  can  also  be  considered  as  curves 
obtained  based  on  a  Gaussian  PDF,  because  for  small  BW,  the 
cosine  PDF  can  be  approximated  by  a  Gaussian  PDF  [21]).  Note 
that  in  Fig.  7  both  models  are  similar  (  £  =  0.98  ),  while  in  Fig.  8 
the  composite  model  shows  a  much  better  fit  ( £  =  0.74 ).  In 
general  the  composite  model  was  able  to  improve  the  fits 
obtained  by  the  simple  model,  which  is  not  surprising  because  it 
has  the  additional  parameter  £  .  This  is  in  agreement  with  the 
noise-like  signal  introduced  in  [17]. 

5.  CONCLUSION 

Space-time  processing  using  antenna  arrays  over  wireless  mobile 
fading  channels  offer  several  advantages  in  cellular  systems,  such 
as  mitigating  fading,  intersymbol  interference,  cochannel 
interference,  etc..  Efficient  joint  use  of  both  space  and  time 
dimensions  demands  for  spatio-temporal  channel  models.  As  a 
basic  channel  model,  we  need  a  two  dimensional  spatio-temporal 
correlation  function  among  the  random  signals  sensed  by  the 
array  elements,  to  characterize  the  second  order  dependence 
structure  of  the  random  channel  in  both  space  and  time.  In  this 
paper  we  have  proposed  a  flexible  spatio-temporal  correlation 
function  for  propagation  scenarios  with  non-isotropic  scattering 
(signal  reception  from  specific  directions).  The  non-uniform 
distribution  for  the  angle  of  arrival,  which  characterizes  the  non¬ 
isotropic  scattering,  is  modeled  by  von  Mises  PDF  which  has 
previously  shown  to  be  successful  in  describing  the  measured 
data.  The  proposed  spatio-temporal  correlation  function  is 
general  enough  to  include  important  special  cases  such  as  Lee’s 
spatio-temporal  correlation  function  and  Clarke’s  temporal 
correlation  function,  both  derived  for  isotropic  scattering. 
Moreover,  its  compact  mathematical  form  facilitates  analytical 
manipulations  of  array-based  techniques  and  results  in  terms  of 
closed-form  expressions  for  such  important  fading  parameters  as 
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spectral  moments  (successive  derivatives  of  the  correlation 
function).  Based  on  two  case  studies  (multiuser  detection  and 
diversity  reception)  and  using  the  new  spatio-temporal 
correlation  function,  we  have  shown  that  non-isotropic  scattering 
(typical  of  many  mobile  channel  scenarios)  has  a  significant 
impact  on  the  performance  of  array  processors,  and  should  be 
taken  into  account  in  the  analysis  and  design  of  adaptive  antenna 
arrays  for  mobile  fading  channels. 

Theoretically,  the  new  correlation  function  is  applicable  to  both 
MS  and  BS.  However,  since  practical  restrictions  limit  the  use  of 
multiple  antennas  at  a  MS,  the  proposed  correlation  function 
seems  to  be  of  much  more  use  in  a  BS.  Therefore,  the  empirical 
justification  of  the  new  correlation  function  is  demonstrated  by 
comparison  with  published  data  collected  at  a  BS. 
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Figure  1.  Isotropic  scattering  in  an  open  area  (circles  are 
scatterers). 


Figure  2.  Non-isotropic  scattering  in  a  narrow  street. 
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Correlation  Coefficient  log , 0(  Pb(y) )  Asymptotic  efficiency  rj\(y) 
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Figure  3.  Asymptotic  efficiency  of  two  multiuser  array 
detectors. 
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Figure  5.  Bit  error  rate  of  BPSK  with  two-branch  MRC. 


Figure  7.  Correlation  coefficient  versus  antennas  spacing 
Simple:  BW  =  0.5° ,  Composite:  BW  =  0.5° ,  £  =  0.98 
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Figure  4.  Asymptotic  efficiency  of  two  multiuser  array 
detectors. 
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Figure  6.  Bit  error  rate  of  BPSK  with  two-branch  MRC. 


Figure  8.  Correlation  coefficient  versus  antennas  spacing 
Simple:  BW  =0.4°  ,  Composite:  BW  =0.2° ,  £  =0.74 
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ABSTRACT 


mixtures  [16,  17], 


For  the  blind  separation  of  sources  (BSS)  problem  (or  the 
independent  component  analysis  (ICA)),  it  has  been  shown 
in  many  situations,  that  the  adaptive  subspace  algorithms 
are  very  slow  and  need  an  important  computation  efforts. 
In  a  previous  publication,  we  proposed  a  modified  subspace 
algorithm  for  stationary  signals.  But  that  algorithm  was 
limited  to  stationary  signals  and  its  convergence  was  not 
fast  enough. 

Here,  we  propose  a  batch  subspace  algorithm.  The  experi¬ 
mental  study  proves  that  this  algorithm  is  very  fast  but  its 
performance  are  not  enough  to  completely  achieve  the  sep¬ 
aration  of  the  independent  component  of  the  signals.  In  the 
other  hand,  this  algorithm  can  be  used  as  a  pre-processing 
algorithm  to  initialized  other  adaptive  subspace  algorithms. 
Keywords:  blind  separation  of  sources,  ICA,  subspace  meth¬ 
ods,  Lagrange  method,  Cholesky  decomposition. 


1.  INTRODUCTION 

The  blind  separation  of  sources  (BSS)  problem  [1]  (or  the 
Independent  Component  Analysis  ”ICA”  problem  [2])  is  a 
recent  and  important  problem  in  signal  processing.  Accord¬ 
ing  to  this  problem,  one  should  estimate,  using  the  output 
signals  of  an  unknown  channel  (i.e.  the  observed  signals 
or  the  mixing  signals),  the  unknown  input  signals  of  that 
channel  (i.e.  sources).  The  sources  are  assumed  to  be  sta¬ 
tistically  independent  from  each  other. 

At  first  the  BSS  was  proposed  in  a  biological  context  [3], 
Actually,  one  can  find  this  problem  in  many  different  situa¬ 
tions:  speech  enhancement  [4],  separation  of  seismic  signals 
[5],  sources  separation  method  applied  to  nuclear  reactor 
monitoring  [6],  airport  surveillance  [7],  noise  removal  from 
biomedical  signals  [8],  etc. 

Since  1985,  many  researchers  have  been  interested  in 
BSS  [9,  10,  11,  12].  Most  of  the  algorithms  deal  with  a  linear 
channel  model:  The  instantaneous  mixtures  (i.e.  memory¬ 
less  channel)  or  the  convolutive  mixtures  (i.e.  the  chan¬ 
nel  effect  can  be  considered  as  a  linear  filter).  The  crite¬ 
ria  of  those  algorithms  were  generally  based  on  high  order 
statistics  [13,  14,  15].  Recently,  by  using  only  second  or¬ 
der  statistics,  some  subspace  methods  have  been  explored 
to  separate  blindly  the  sources  in  the  case  of  convolutive 


In  previous  works,  we  proposed  two  subspace  approaches 
using  LMS  [18,  17]  or  a  conjugate  gradient  algorithm  [19] 
to  minimize  subspace  criteria.  Those  criteria  were  been  de¬ 
rived'- from  the  generalization  of  the  method  proposed  by 
Gesbert  et  al.  [20]  for  blind  identification1 .  To  improve  the 
convergence  speed  of  our  algorithms,  we  proposed  a  modi¬ 
fied  subspace  algorithm  for  stationary  signals  [21].  But  that 
algorithm  was  limited  to  stationary  signals  and  its  conver¬ 
gence  was  not  fast  enough.  Here,  we  propose  a  new  sub¬ 
space  algorithm,  which  improves  the  performance  of  our 
previous  methods. 


2.  MODEL,  ASSUMPTIONS  &  CRITERION 

Let  Y (n)  denotes  the  g  x  1  mixing  vector  obtained  from  p 
unknown  and  statistically  independent  sources  S(n)  and  let 
the  g  x  p  polynomial  matrix  7f(z)  =  ( hij(z ))  denotes  the 
channel  effect  (see  fig.  1).  In  this  paper,  we  assume  that  the 
filters  hij(z)  are  causal  and  finite  impulse  response  (FIR) 
filters.  Let  us  denote  by  M  the  highest  degree2  of  the  filters 
hij(z).  In  this  case,  Y(n)  can  be  written  as: 

M 

Y(»)  =  £H(i)S(»-i),  (1) 

t=0 


where  S(n  —  i)  is  the  p  x  1  source  vector  at  the  time  ( n  —  i) 
and  H(»)  is  the  real  q  x  p  matrix  corresponding  to  the  filter 
matrix  H(z)  at  time  i. 

Let  Yn(u)  (resp.  5jvf+iv(n))  denotes  the  g(N  +  1)  x  1 
(resp.  (M  +  N  +  1  )p  x  1)  vector  given  by: 


YN(n) 


S’m+.nJ") 


/  n») 

\  Y(n  —  N) 
S(n) 

S(n  —  M  —  N) 


JIn  the  identification  problem,  the  authors  generally  assume 
that  they  have  one  source  and  that  the  source  is  an  iid  signal. 
2M  is  called  the  degree  of  the  filter  matrix  'H(z). 
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Sub-space  method 


By  using  N  >  q  observations  of  the  mixture  vector,  we  can 
formulate  the  model  (1)  in  another  form: 

Yw(n)  =  TW(H  )SU+N(n),  (2) 

where  Tjv(H)  is  the  Sylvester  matrix  corresponding  to  H(z). 
The  q(N  +  1)  x  p{M  +  N  +  1)  matrix  Tat(H)  is  given  by 


[22]  as: 

r  H(o) 

H(l)  ... 

H  (M) 

0  ...  0 

0 

H(0)  ... 

H(M-  1) 

H(Af)  0 

0 

0 

H(0) 

H(l)  ...  H(M) 

Finally,  to  avoid  the  spurious  solutions  (i.e.  a  singular 
matrix  M),  one  must  minimize  that  criterion  subject  to  a 
constraint  [17]: 

Subject  to  GoR.w(n)G(T  =  I,  (5) 

here  Rjv(n)  =  E  Yjv(n)Yjv  (»)>  and  the  pxq(N  +  1)  matrix 
Go  stands  for  the  first  bloc  line  of  G  =  (Gjf  •••  G^M+N^) 
The  minimization  using  a  LMS  algorithm  of  the  above  cri¬ 
terion  with  respect  to  a  constraint  was  discuss  in  our  previ¬ 
ous  work  [17].  In  addition,  the  minimization  of  a  modified 
version  of  the  above  criterion  was  done  using  a  conjugate 
gradient  algorithm  [19]. 


It  was  proved  in  [23]  that  the  rank  of  Sylvester  matrix 
Tat(H)  =  p(N  +  1)  +  XX i  Mi,  where  M,  is  the  degree  of 
the  ith  column3  of  H(z).  Now,  it  is  easy  to  prove  that  the 
Sylvester  matrix  has  a  full  rank  and  it  is  left  invertible  if 
each  column  of  the  polynomial  matrix  H(z)  has  the  same 
degree  and  N  >  Mp  (see  [24]  for  more  details).  From  equa¬ 
tion  (2),  one  can  conclude  that  the  separation  of  the  sources 
can  be  achieved  by  estimating  a  ( M  +  N  +  l)p  x  q(N  +  1) 
left  inverse  matrix  G  of  the  Sylvester  matrix.  To  estimate 
G,  one  can  use  criterion  proposed  in  [17]  obtained  from  the 
generalization  of  the  criterion  in  [20]: 

min  G(G)  =  E  ||(I  0)GYjv(n)-(0  I)GY*(n  +  l)||2,  (3) 

here  E  stands  for  the  expectation,  I  is  the  identity  matrix 
and  0  is  a  zero  matrix  of  appropriate  dimensions.  It  has 
been  shown  in  [17]  that  the  above  minimization  lead  us  to 
a  matrix  G*  such: 


3.  ALGORITHM 

From  the  previous  section,  it  is  clear  that  the  minimiza¬ 
tion  of  the  criterion  (3)  should  be  done  subject  to  a  p2 
constraints4.  Let  const  denotes  the  constraint  vector  (i.e. 
const  =  Vec  (GoRjv(rc)Go'  —  I),  here  Vec  is  the  operator 
that  corresponds  to  a  p  x  q  matrix  a  pq  vector).  The  min¬ 
imization  of  the  criterion  (3)  subject  to  the  constraints  (5) 
can  be  formulated  using  the  Lagrange  method  as: 

£(G,  A)  =  C(G)  -  A  const  (6) 

here  A  is  a  line  vector,  stands  for  the  Lagrange  parameters. 
The  minimization  of  the  above  equation  with  respect  to  A 
leads  us  to  the  constraint  equation  (5).  Using  the  derivative 
dC( G)/3G  given  in  [17],  the  equation  (5)  and  (6),  one  can 
write: 


Perf  =  G*TW(H)  =  diag(M,  •  •  • ,  M),  (4) 

where  M  is  any  p  x  p  matrix.  Using  the  last  equation,  it 
becomes  clear  that  the  separation  is  reduced  to  the  sepa¬ 
ration  of  an  instantaneous  mixture  with  a  mixing  matrix 
M.  In  other  words,  this  algorithm  can  be  decomposed  into 
two  steps:  First  step,  by  using  only  second-order  statistics, 
we  reduce  the  convolutive  mixture  problem  to  an  instan-' 
taneous  mixture  (deconvolution  step);  then  in  the  second 
step,  we  must  only  separate  sources  consisting  of  a  simple 
instantaneous  mixture  (typically,  most  of  the  instantaneous 
mixture  algorithms  are  based  on  fourth-order  statistics). 


dC(  G,A) 
dG 


Ip  0  0  \ 

0  2I(m+jv-i)p  0  1  GRjv(ti) 

0  0  Ip  / 


-(j  I(M+N)p  ^GR£(n+1) 


/  0  0 

\  I (M+N)p  0 


GRat(w  +  1)  — 


2T  GoRjv(n) 

0 


where  Rj v(n  +  1)  =  E  YN(n)Y^(n  +  1)  and  Ii  is  the  l  x  l 
identity  matrix.  By  canceling  the  above  equation  and  after 
some  algebraic  operations,  one  can  find  that  the  bloc  lines 


3  The  degree  of  a  column  is  defined  as  the  highest  degree  of 
the  filters  in  this  column. 


4  Using  the  symmetrical  form  of  the  equation  (5),  one  can 
decrease  the  constraint  number  to  p(p  +  l)/2. 
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of  the  optimal  G*  should  satisfy: 


GoRjv(n)Go' 

=  I, 

(7) 

2GtR/v(n) 

=  G(i+1)R-N(n  +  1)  + 

G(i_x)Rjv(n  4-  1), 

(8) 

G(m+w)Rzv 

=  G(m+at-i)R-w(«  + 1), 

(9) 

here  1  <  i  <  M  +  N  -  1.  Let  A  =  R^(n  +  l)R^1(n)  and 
B  =  R x(n  +  l)R^1(n),  we  should  mention  that  A  and  B 
exist  if  and  only  if  (iff)  Rjv(n)  is  full  rank5.  Finally,  using 
some  algebraic  operations,  we  can  prove  that  the  previous 
matrix  equation  system  can  be  solved  by  a  recursion  for¬ 
mula: 

=  G(M+W-i-2)Di  (10) 

her  0  <  i  <  M  +  N  —  1  and  the  Go  can  be  obtained  from  the 
first  equation  (7),  using  a  simple  Cholesky  decomposition. 
In  addition,  the  matrices  Di  can  also  be  obtained  by: 

D(i+i)  =  B(2I  —  D,A)-1  (11) 

here  0<i<M  +  N—  1  and  Do  =  B.  Even  if  relation¬ 
ships  (10)  and  (11)  looks  complicated,  but  the  time  needed 
to  obtain  the  matrix  G  still  very  comparable6  to  the  time 
needed  for  the  convergence  of  LMS  version  [17]  or  even  the 
Conjugate  Gradient  version  [21,  19]. 

4.  EXPERIMENTAL  RESULTS 

The  experiments  discussed  here  are  conducted  using  two 
sources  ( p  =  2)  with  uniform  probability  density  function 
(pdf)  and  four  sensors  (9  =  4),  and  the  degree  of  H(z)  is 
chosen  as  ( M  =  4). 

To  show  the  performances  of  the  subspace  criterion,  the 
matrix  Perf  =  G*Tjv(H)  is  plotted.  In  the  other  hand, 
we  know  that  the  deconvolution  is  achieved  iff  the  matrix 
Perf  is  a  bloc  diagonal  matrix  as  shown  in  equation  (4). 
Figure  2  shows  the  performances  of  the  batch  subspace  al¬ 
gorithm  discussed  in  this  paper.  It  is  clear  from  that  figure  2 
that  the  first  step  of  the  algorithm  (the  deconvolution)  was 
not  satisfactory  achieved  (Perf  is  not  a  bloc  diagonal  as  in 
equation  (4).  This  problem  was  obtained  because  the  crite¬ 
rion  (3)  is  a  flat  function  around  its  minima  (see  figure  (2)). 

Figure  3  shows  us  the  performance  results  and  the  crite¬ 
rion  convergence  of  the  LMS  algorithm  (first  column),  and 
the  performance  results  and  the  criterion  convergence  of 

5  It  is  easy  to  prove  that  Rjy(n)  is  full  rank  iff  one  add  some 
additive  independent  noise  to  the  observed  signals,  because  one 
of  the  subspace  assumption  q  >  p.  In  the  other  hand  and  by  us¬ 
ing  the  criterion  (3),  one  can  prove  the  existence  of  some  spurious 
minima,  if  the  model  have  some  additive  noise  (the  demonstra¬ 
tion  will  be  omitted  here  because  the  limit  of  the  sheet  number). 
However,  the  experimental  study  shows  that  one  still  obtain  good 
results  for  a  20  dB  ratio  of  signal  to  noise  (RSN).  In  our  simula¬ 
tion,  we  added  a  Gaussian  noise  with  RSN  >  20dB. 

6  Indeed,  using  C  code  program  and  an  ultra  30  creator  sun 
station,  it  needs  few  minutes  (less  than  5)  to  obtained  the  matrix 
G.  But  the  convergence  of  the  conjugate  gradient  needs  from 
40  to  100  minutes  and  the  LMS  algorithm  needs  few  hours  to 
converge. 


the  same  LMS  algorithm  but  the  matrix  G  is  initialized  us¬ 
ing  the  result  of  the  batch  algorithm  (second  column).  We 
should  mention  that  the  time  needed  to  obtain  the  minima 
by  the  initialized  version  was  almost  half  the  time  needed  by 
the  non  initialized  version.  Figures  3  (c)  and  (d)  show  the 
criterion  convergence  (the  stop  condition  was  the  limit  of 
the  sample  number,  i.e.  10000).  The  experimental  studies 
show  that  the  Conjugate  Gradient  version  of  the  subspace 
algorithm  can  converge  faster  and  lead  us  to  better  per¬ 
formances  if  that  algorithm  has  been  initialized  using  the 
batch  proposed  algorithm  (these  results  will  be  omitted  in 
this  short  paper). 

The  second  step  of  the  algorithm  consists  on  the  sep¬ 
aration  of  a  residual  instantaneous  mixture  (correspond¬ 
ing  to  M,  see  equation  (4)).  This  separation  can  be  pro¬ 
cessed  using  any  source  separation  algorithm  applicable  to 
instantaneous  mixtures.  Here,  we  chose  the  minimization 
of  a  cross-cumulant  criterion  using  Levenberg-Marquardt 
method  [25].  Figure  (4)  shows  us  the  different  signals  (see 
figure"  (1)).  It  is  clear  that  the  sources  X  and  the  estimated 
signals  S  are  independent  signals  and  the  vector  Z,  output 
of  the  subspace  criterion,  corresponds  to  an  instantaneous 
mixture,  and  the  observed  vector  Y  corresponds  to  a  con- 
volutive  mixture  (see  [26,  27]). 

Finally,  the  estimation  of  the  second  and  the  high  order 
statistics  was  done  according  to  the  method  described  in 
[28]. 

5.  CONCLUSION 

In  this  paper,  we  propose  a  batch  algorithm  for  source  sep¬ 
aration  in  convolutive  mixtures  based  on  a  subspace  ap¬ 
proach.  This  new  algorithm  requires,  as  same  as  the  other 
subspace  methods,  that  the  number  of  sensors  is  larger  than 
the  number  of  sources.  In  addition,  it  allows  the  separation 
of  convolutive  mixtures  of  independent  sources  using  mainly 
second-order  statistics:  A  simple  instantaneous  mixture, 
the  separation  of  which  generally  needs  high-order  statis¬ 
tics,  should  be  conducted  to  achieve  the  separation. 

The  experimental  study  shows  that  the  the  present  algo¬ 
rithm  can  be  used  for  initialized  an  adaptive  subspace  algo¬ 
rithm.  The  initialized  algorithms  need  less  time  to  converge. 
These  results  were  discussed  in  the  case  of  two  subspace 
algorithms  which  are  based  on  LMS  or  on  a  conjugate  gra¬ 
dient  method.  Finally,  the  subspace  LMS  criterion  and  the 
Conjugate  gradient  criterion  will  become  more  stable  and 
faster  if  they  are  initialized  using  the  present  algorithm. 
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ABSTRACT 

We  apply  the  2-D  broadband  Maximum  Likelihood  (ML) 
and  interpolated  root-MUSIC  methods  to  estimate  the 
azimuth  and  velocity  parameters  of  teleseismic  events 
recorded  by  the  GERESS  array.  A  sequential  test  based 
on  Likelihood  Ratios  (LR’s)  is  developed  for  signal  de¬ 
tection.  Our  experimental  results  show  that  both  meth¬ 
ods  can  provide  reliable  estimates  of  signal  parameters. 
However,  ML  is  shown  to  have  better  estimation  accu¬ 
racy  and  robustness  than  interpolated  root-MUSIC  at 
the  expense  of  a  higher  computational  cost. 

1.  INTRODUCTION 

The  ML  and  MUSIC  techniques  are  two  popular  meth¬ 
ods  in  array  processing.  Numerous  theoretical  and 
numerical  studies  have  shown  that  ML  outperforms 
MUSIC  in  scenarios  with  low  Signal  to  Noise  Ratios 
(SNR’s),  small  number  of  samples,  coherent  signals,  as 
well  as  closely  spaced  sources  [1],  However,  an  enor¬ 
mously  high  computational  cost  needed  for  ML  makes 
this  statistically  optimal  approach  in  many  cases  less 
attractive  than  MUSIC.  Therefore,  a  crucial  issue  is 
how  to  choose  a  proper  algorithm  for  a  particular  ap¬ 
plication  to  achieve  sufficiently  high  performance  and 
acceptable  computational  complexity. 

In  the  present  work,  we  apply  broadband  ML  [2] 
and  2-D  interpolated  root-MUSIC  [3]  to  localization  of 
several  teleseismic  events  using  the  GERESS  array  real 
data.  A  sequential  test  procedure  based  on  LR’s  is  used 
to  detect  signals  within  the  observation  interval.  Due 
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to  complicated  propagation  effects,  there  may  be  more 
than  one  signal  phase  arriving  at  the  same  time  from 
the  same  direction.  However,  different  signal  phases 
should  differ  in  their  velocities.  It  is  worth  noting  that 
the  ML  method  can  be  directly  applied  to  the  broad¬ 
band  Direction  Of  Arrival  (DOA)  estimation  problem. 
On  the  other  hand,  root-MUSIC  should  be  adapted  to 
the  broadband  setting,  for  example,  by  means  of  the 
so-called  array  interpolation  technique  [4]  allowing  to 
combine  the  information  from  different  frequencies  in  a 
coherent  way.  In  [3]  and  [6],  a  high-SNR  regional  man¬ 
made  seismic  event  was  analyzed  by  means  of  the  ML 
and  interpolated  root-MUSIC  techniques.  Both  meth¬ 
ods  provided  excellent  results  in  this  case.  Below,  we 
address  a  more  difficult  teleseismic  event  case,  which  is 
characterized  by  much  lower  SNR’s  and  more  compli¬ 
cated  propagation  phenomena  relative  to  the  regional 
event  case.  In  the  teleseismic  case,  signal  detection 
becomes  a  very  important  issue,  since  it  is  almost  im¬ 
possible  to  identify  weak  signals  in  seismograms  (for 
example,  see  Fig.  1  displaying  a  typical  seismogram  of 
teleseismic  event). 

The  experimental  results  reported  in  the  present 
paper  demonstrate  that  in  the  teleseismic  case,  both 
ML  and  interpolated  root-MUSIC  may  be  successfully 
applied  to  source  localization.  ML  is  shown  to  have 
better  performance  arid  robustness  than  interpolated 
root-MUSIC.  However,  the  latter  approach  enjoys  sim¬ 
pler  implementation. 

2.  DATA  MODEL 

Let  an  array  of  N  sensors  receive  M  broadband  signals 
from  far-field  sources.  The  2-D  array  can  be  assumed 
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since  the  length  of  the  vertical  aperture  of  GERESS 
is  much  smaller  than  that  of  the  horizontal  one  and  is 
negligible  compared  to  the  seismic  signal  wavelength. 
The  array  output  x(t)  sampled  at  discrete  times  t  = 
0, . . . ,  T—  1  is  short-time  Fourier- transformed  using  the 
so-called  Thomson’s  multitaper  technique  [7]: 

1  T_1 

Xt(u)  =  -=  £  wt(t)x(t)e~jut ,  (l  =  0,. .  .,L  -  1) , 
VT  2=0 

(1) 

where  {wi(t)}t=o,...,T-i  is  the  Zth  orthonormal  window 
function. 

For  sufficiently  large  T,  the  Fourier-transformed  da¬ 
ta  can  be  approximately  expressed  as 

Xi(u)  =  H(w)£i(a/)+f?,(w),  (2) 

H(w)  =  [di(w),...,djif(w)],  (3) 

where  Xt(u)  e  CNxl,  H(w)  €  CNxM ,  S,(w)  €  CMxl, 
and  Ui{<jj)  €  C7Vxl  are  the  observation  vector,  the 
steering  matrix,  the  vector  of  signal  waveforms,  and 
the  vector  of  sensor  noise,  respectively.  The  steering 
vector  drn(ui)  associated  with  the  mth  signal  is  given 
by 

dmM  =  [e-^»ri ,  •  •  • ,  e-^r1T ,  (4) 

where  rn  =  (xn,  yn)  is  the  coordinate  of  the  nth  sensor. 
The  slowness  vector  £m  is  related  to  the  source  azimuth 
am  and  the  respective  velocity  Vm  as  follows: 

§m  =  T7~[  cos am,  sin am  }T .  (5) 

V  m 

The  signal  waveforms  Si(u}j),(l  =  0, ...,L  —  1,  j  = 
1, . . . ,  J)  are  assumed  to  be  deterministic  and  unknown. 
From  the  asymptotic  theory  of  the  Fourier  transform, 
it  is  well-known  that  Xi(u>j),  ( l  =  0,  —  1;  j  = 

1  ,...,</)  are  independent  complex  Gaussian  distributed 
with  the  mean  H(wj)S) (utj)  and  the  covariance  matrix 
v(cjj) I  where  v{uj)  is  the  sensor  noise  power  at  the  fre¬ 
quency  uij  and  I  is  the  identity  matrix  [2],  The  problem 
is  to  detect  the  signals  and  estimate  their  parameters 
Kn},  m  =  1, .  .  .  ,  M. 

3.  WIDEBAND  MAXIMUM  LIKELIHOOD 

Based  on  the  independence  and  asymptotic  gaussianity 
in  the  frequency  domain,  the  approximate  wideband 
log-likelihood  function  can  be  expressed  as  [2] 

j 

m  =  £loStr  [{I  -  P(wi,0)}B*(«j)]  ,  (6) 

j= 1 


where 

=  (7) 

denotes  the  unknown  slowness  vector,  P(ujj,d)  is  the 
proiection  matrix  onto  the  column-space  of  the  matrix 
HK), 

1  i_1 

RzK)  =  j  £ ZMZfiui)  (8) 

^  1=0 

is  the  sample  spectral  density  matrix,  and  (• )H  denotes 
the  Hermitian  transpose.  The  ML  estimate  ^ML  is  ob¬ 
tained  by  minimizing  (6)  over  r). 

4.  WIDEBAND  INTERPOLATED 
ROOT-MUSIC 

In  this  section,  we  describe  the  2-D  extension  [3]  of  the 
wideband  interpolated  root-MUSIC  algorithm  [5]  that 
will  be  applied  for  joint  estimation  of  the  azimuth  and 
velocity  parameters  of  seismic  sources. 

Let  the  2-D  array  be  divided  into  two  subarrays  of 
Ns  sensors  each,  denoted  as  subarrays  (a)  and  (b),  re¬ 
spectively.  Since  the  outline  of  the  algorithm  is  similar 
for  each  subarray,  in  the  sequel  we  consider  only  the 
subarray  (a).  Its  observation  vector  can  be  modeled  as 

Xl,a  M  =  Ha (Lu)St  (W)  +  Ulta  M  •  (9) 

This  subarray  will  be  used  for  interpolation  of  the  set  of 
J  virtual  ULA’s  with  the  interelement  spacings  dcucluj 
(j  =  1, . . . ,  J),  where  u>c  is  the  central  frequency,  and 
dc  is  the  interelement  spacing  of  the  virtual  ULA  at 
u)c.  To  obtain  the  same  array  manifold  for  each  fre¬ 
quency,  the  interpolation  matrices  B j  can  be  designed 
in  a  regular  way  [4].  The  coherently  averaged  covari¬ 
ance  matrix  can  be  obtained  as 

1  ^ 

Ra  =  7£BfRaBJ,  (10) 

J  i=i 

where 

1  L~l 

=  7£w<m-  (u) 

^  1=0 

The  noise  covariance  matrix  after  the  coherent  process¬ 
ing  can  be  computed  as 

1  J 

Q=7£^K)BfB,,  (12) 

J  i= 1 

where  0(cjj)  is  some  estimate  of  sensor  noise  at  the 
frequency  ojj.  The  matrix 

Rq  =  Q-1/2RaQ-1/2  (13) 
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is  the  spectral  density  matrix  after  prewhitening.  The 
eigendecomposition  of  this  matrix  yields 

Ra  =  UsAsUf  +  UwAjvU"  ,  (14) 

where  the  matrices  Us  and  U /v  contain  the  signal-  and 
noise-subspace  eigenvectors,  respectively.  In  turn,  the 
diagonal  matrices  A g  and  Ajv  contain  the  signal-  and 
noise-subspace  eigenvalues,  respectively. 

The  root-MUSIC  polynomial  has  the  form 

Da(z)  =  ^r(l/*)Q-1/2UJVUjjQ-1/2d(*) ,  (15) 

where  d(z)  =  [1  ,z~1,...,zN~1]T.  Let  {za,i,  ■  ■  ■  ,2a, m} 
denote  the  M  signal  roots  of  (15),  which  are  sorted 
based  on  their  proximity  to  the  unit  circle.  Similarly, 
we  can  find  M  signal  roots  {z^i , . . . ,  zbtM}  for  subar¬ 
ray  (b).  Combining  the  results  from  these  two  virtual 
subarrays,  we  can  find  M2  candidate  estimates  of  £  by 
solving  the  system 

Axatx  +  A  yat,y  =  arg  , 

UJc 

Axb£x  +  A  yb£y  =  arg  —  (16) 

u>c 

for  i,k  =  1  where  Axa,  A ya,  Axb,  and  Ayb 

define  the  interelement  spacings  of  the  virtual  arrays 
(a)  and  (b),  respectively.  The  final  estimate  of  £  is 
then  obtained  by  selecting  the  M  pairs  (£x,£y)  which 
correspond  to  the  maximal  values  of  2-D  MUSIC  spec¬ 
tral  function.  The  estimates  of  azimuth  and  velocity 
^music  can  be  obtained  from  these  M  pairs  using  (5). 

5.  LIKELIHOOD  RATIO  TEST 

In  this  section,  we  develop  a  sequential  LR-based  test 
for  detecting  the  number  of  signals.  Let  m  denote  the 
hypothetical  number  of  signals.  In  each  step,  the  de¬ 
tection  problem  can  be  formulated  as  testing  the  hy¬ 
pothesis  Km  against  the  alternative  Am: 

Km  m  signals  are  present , 

Am  more  than  m  signals  are  present . 

Starting  from  m  =  0,  this  test  should  be  performed 
stepwise  and  then  stopped  once  the  hypothesis  Km  be¬ 
comes  accepted.  Applying  LR  principle,  we  obtain  the 
following  test  statistic  in  the  mth  step  [2]: 

1  J 

tm  =  +  —Fm{Uj))  ^  ta,  (17) 

J  3=1  "2 

where 


n2tr. 

|Pm+l  (w)  2?ML  )  Pm(kM 

?ml)}R-a(w)] 

ni 

«[ 

—  Pm+l(w,l)ML  )j 

f  R*(“0] 

(18) 


n\  =  L(2m  +  4),  n2  —  L(2N  —  2) ,  (19) 

and  €  K2m  is  the  ML  estimate  of  the  signal  pa¬ 
rameter  vector.  If  tm  exceeds  the  test  threshold  ta ,  the 
hypothesis  will  be  rejected.  The  quantity  calculated  by 
Fm  (u>)  can  be  interpreted  as  an  estimate  of  the  increase 
in  SNR  when  adding  the  (m+l)th  signal.  To  be  de¬ 
tected,  the  power  of  (m-t-l)th  signal  must  be  sufficiently 
high  compared  to  the  noise  power.  Under  the  hypoth¬ 
esis  Km,  the  value  Frn{u)b)  is  approximately  centrally 
F-distributed  with  the  degrees  of  freedom  ri\  and  n2  ■ 
The  threshold  ta  is  determined  by  the  Cornish-Fisher 
expansion  with  a  good  accuracy  [8]-[9] .  Note  that  the 
LR  test  can  be  easily  implemented  if  the  corresponding 
ML  estimates  are  available. 

6.  REAL  DATA  PROCESSING 

In  this  section,  we  apply  the  developed  techniques  to 
real  data  processing.  These  data  were  recorded  by  the 
GERESS  array  located  in  the  Bavarian  Forest,  Ger¬ 
many.  Details  about  this  array  can  be  found  in  [10]. 
Two  teleseismic  events  (earthquakes)  which  occurred 
on  February  13,  1993  in  the  Eastern  Mediterranean 
and  on  February  26,  1996  in  the  Middle  East,  respec¬ 
tively,  were  selected  for  our  analysis.  The  latter  event 
is  contaminated  by  a  smaller  pre-shock,  located  about 
37  km  from  the  main  event.  More  information  about 
the  selected  events  is  collected  in  Table  1. 

Array  output  was  sampled  with  fs  =  40  Hz.  For 
each  data  set,  we  used  a  sliding  window  with  the  length 
of  3.2  s  and  the  shift  of  0.5  s.  The  total  of  seven  fre¬ 
quency  bins  between  0.9  and  3.1  Hz  have  been  used. 
Two  independent  virtual  ULA  sets  have  been  employed 
for  the  interpolated  root-MUSIC  algorithm  with  the 
central  frequency  fc  —  2.2  Hz.  The  spectral  density 
matrix  Rx  (wj )  has  been  estimated  using  L  =  3  Thom¬ 
son’s  windows  which  roughly  correspond  to  3  inde¬ 
pendent  snapshots.  The  sequential  detection  proce¬ 
dure  kept  the  test  level  a  =  0.033  constant  in  each 
step.  Theoretical  slowness  values  have  been  derived 
from  AK135  earth  model  [11]. 

The  results  obtained  from  the  weak  event  analysis 
are  shown  in  Figs.  1  and  2.  Typical  seismometer  out¬ 
puts  are  plotted  in  the  first  subplot  of  these  figures. 
The  second  subplot  shows  the  output  of  the  LR-based 
detector  which  was  used  in  conjunction  with  both  tech¬ 
niques  to  provide  their  adequate  comparison.  Appar¬ 
ently,  the  P-phases  are  detected  with  a  good  time  reso¬ 
lution  while  the  S-phases  (traveling  with  lower  velocity) 
are  not  detected  at  all.  Some  false  alarms  can  be  ob¬ 
served.  The  ML  estimates  for  the  back-azimuth  and 
velocity  are  well  concentrated  around  their  theoretical 
values.  The  estimates  obtained  from  2-D  interpolated 
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Table  1:  Event  List  from  NEIC. 


time 

h.m.s 

lat 

deg  N 

long 
deg  E 

dist 

deg 

az 

deg 

mag 

mb 

locat 

03:42:53 

34.43 

24.81 

16.60 

146.1 

3.7 

Crete 

07:17:08 

28.87 

34.48 

25.54 

133.8 

4.0 

Gulf 

Aqaba 

07:17:28 

28.73 

34.82 

25.81 

133.4 

5.0 

Gulf 

Aqaba 

root-MUSIC  show  higher  variances.  Interestingly,  both 
methods  provide  better  results  for  the  azimuth  than  the 
velocity.  Such  a  relatively  poor  performance  of  velocity 
estimates  may  be  explained  by  quite  a  limited  aperture 
length  of  GERESS. 

In  Figs.  3  and  4,  another  event  is  analyzed.  It  con¬ 
tains  two  seismic  sources  of  moderate  scales  originating 
from  the  same  location  but  at  slightly  different  times 
(see  Table  1).  In  this  data  set,  a  stronger  event  follows 
shortly  after  a  weak  event.  In  particular,  such  a  situ¬ 
ation  is  of  great  importance  when  monitoring  nuclear 
explosions.  Due  to  high  SNR’s,  the  signals  can  be  cor¬ 
rectly  detected  during  the  whole  analysis  interval.  One 
signal  is  detected  at  about  30th  second  when  waves 
from  the  first  earthquake  arrive  the  array.  At  57th  sec¬ 
ond,  the  LR  test  shows  two  signals,  corresponding  to 
the  case  when  the  superimposing  waves  from  the  first 
and  second  seismic  sources  both  arrive  the  array.  Dur¬ 
ing  the  period  from  300th  to  360th  second  (the  so-called 
S-phases),  similar  detection  results  can  be  observed  as 
well.  The  signals  detected  from  the  beginning  of  the 
analysis  up  to  16th  second  could  be  interpreted  as  false 
alarms  or  another  weak  event.  The  estimates  of  the  az¬ 
imuth  and  velocity  shown  in  subplots  3  and  4  illustrate 
that  the  ML  technique  has  better  robustness  and  lower 
variance  than  the  2-D  interpolated  root-MUSIC  tech¬ 
nique.  Note  that  the  performance  of  the  latter  method 
is  not  much  better  in  the  strong  event  case  than  in  the 
weak  event  one,  since  the  interpolation  errors  become 
more  critical  at  high  SNR’s.  Similarly  to  the  previous 
example,  both  methods  show  better  azimuth  estima¬ 
tion  performance  relative  to  that  of  velocity  estimation. 

7.  CONCLUSIONS 

We  compared  the  performances  of  wideband  ML  and 
interpolated  root-MUSIC  algorithms  by  processing  weak 
and  strong  teleseismic  events  recorded  by  the  GERESS 
array.  Our  results  show  that  ML  has  better  estimation 
accuracy  and  robustness  relative  to  root-MUSIC.  An¬ 
other  advantage  of  ML  is  that  the  application  of  the 
LR  test  for  detecting  the  number  of  signals  is  straight¬ 
forward.  However,  the  enormous  computational  cost 


GERESS  data :  13.02.1993  03:43  34.4N  24.8E  mb = 3.7  Crete 


time:  03:46:00  -  03:52:00  (sec) 


Figure  1:  Wideband  ML,  first  event.  ” — theoretical 
values  for  back-azimuth,  ”x”:  theoretical  values  for 
velocity. 

GERESS  data :  13.02.1993  03:43  34.4N  24.8E  mb  =  3.7  Crete 


time:  03:46:00  -  03:52:00  [sec] 


Figure  2:  Wideband  interpolated  root-MUSIC,  first 
event.  ” — theoretical  values  for  back-azimuth, 
”  x”:  theoretical  values  for  velocity. 
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GERESS  data :  26.02.1996  07:17  28.7N  34.8E  mb  =  5.0  Guff  of  Aqaba 


30  60  90  120  150  180  210  240  270  300  330  360 

Bme:  0752:00  -  07:28:00  [sec] 


Figure  3:  Wideband  ML,  second  event.  ” — theo¬ 
retical  values  for  back-azimuth,  ”  x”:  theoretical  values 
for  velocity  of  the  main  event,  theoretical  values 
for  velocity  of  the  pre-shock. 


GERESS  data :  26.02.1996  07:17  28.7N  34.8E  mb  =  5.0  Gull  ol  Aqaba, 
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30  60  90  120  1  50  1  80  210  240  270  300  330  360 

fme:  0722:00  -  07:28:00  [sec] 


Figure  4:  Wideband  interpolated  root-MUSIC,  sec¬ 
ond  event.  ” — theoretical  values  for  back-azimuth, 
”x”:  theoretical  values  for  velocity  of  the  main  event, 
”  *” :  theoretical  values  for  velocity  of  the  pre-shock. 


associated  with  the  ML  technique  may  be  critical  in 
practical  applications. 


REFERENCES 

[1]  J.F.  Bohme,  “Advances  in  spectrum  analysis  and 
array  processing,”  in  Array  Processing ,  Haykin,  S., 
Editor,  Prentice  Hall,  pp.  1-63,  1991. 

[2]  J.F.  Bohme,  “Statistical  array  signal  processing  of 
measured  sonar  and  seismic  data,”  in  Proc.  SPIE 
2563:  Advanced  Signal  Processing  Algorithms,  San 
Diego,  CA,  July  1995,  pp.  2-20. 

[3]  D.V.  Sidorovich  and  A.B.  Gershman,  “Two- 
dimensional  wideband  interpolated  root-MUSIC 
applied  to  measured  seismic  data,”  IEEE  Trans. 
Signal  Processing,  vol.  46,  pp.  2263-2267,  Aug. 
1998. 

[4]  B.  Friedlander,  “The  root-MUSIC  algorithm  for  di¬ 
rection  finding  with  interpolated  arrays,”  Signal 
Processing,  vol.  30,  pp.  15-29,  Jan.  1993. 

[5]  B.  Friedlander  and  A.J.  Weiss,  “Direction  finding 
for  wideband  signals  using  an  interpolated  array,” 
IEEE  Trans.  Signal  Processing,  vol.  41,  pp.  1618- 
1634,  Apr.  1993. 

[6]  D.V.  Sidorovich,  C.F.  Mecklenbrauker,  and  J.F. 
Bohme,  “Sequential  test  and  parameter  estima¬ 
tion  for  array  processing  of  seismic  data,”  in  Proc. 
8th  IEEE  Workshop  Stat.  Signal  Array  Processing, 
Corfu,  Greece,  June  1996,  pp.  256-259. 

[7]  D.J.  Thomson,  “Spectrum  estimation  and  har¬ 
monic  analysis,”  Proc.  IEEE,  vol.  70,  pp.  1055- 
1096,  Sep.  1982. 

[8]  P.  Hall,  The  Bootstrap  and  Edgeworth  Expansion, 
Springer- Verlag,  NY,  1992. 

[9]  C.F.  Mecklenbrauker,  P.  Gerstoft,  J.F.  Bohme, 
and  P-J.  Chung,  “Hypothesis  testing  for  geoacous¬ 
tic  environmental  models  using  likelihood  ratio,” 
JASA,  vol.  105,  pp.  1738-1748,  March  1999. 

[10]  H.P.  Harjes,  “Design  and  siting  of  a  new  regional 
array  in  Central  Europe,”  Bull.  Seism.  Soc.  Am., 
vol.  80B,  pp.  1801-1817,  June  1990. 

[11]  B.  Kennett,  E.R.  Engdahl,  and  R.  Buland,  “Con¬ 
straints  on  seismic  velocities  in  the  Earth  from  trav- 
eltimes,”  Geophys.  J.  Int.  ,  vol.  122,  pp.  108-124, 
1995. 


72 


BOUNDS  ON  UNCALIBRATED  ARRAY  SIGNAL  PROCESSING 


Brian  M.  Sadler 


Richard  J.  Kozick 


Army  Research  Laboratory 
Adelphi,  MD  20783 
bsadler@arl.mil 


Bucknell  University 
Lewisburg,  PA  17837 
kozick@bucknell.edu 


ABSTRACT 

Deterministic  constrained  Cramer-Rao  bounds  (CRBs) 
are  developed  for  general  linear  forms  in  additive  white 
Gaussian  noise.  The  linear  form  describes  a  variety  of  ar¬ 
ray  processing  cases,  including  narrow  band  sources  with 
a  calibrated  array,  the  uncalibrated  array  cases  of  instan¬ 
taneous  linear  mixing  and  convolutive  mixing,  and  space- 
time  coding  scenarios  with  multiple  transmit  and  receive 
antennas.  We  employ  the  constrained  CRB  formulation  of 
Stoica  and  Ng,  allowing  the  incorporation  of  side  informa¬ 
tion  into  the  bounds.  This  provides  a  framework  for  a  large 
variety  of  scenarios,  including  semi-blind,  constant  modu¬ 
lus,  known  moments  or  cumulants,  and  others.  The  CRBs 
establish  bounds  on  blind  estimation  of  sources  using  an 
uncalibrated  array,  and  facilitates  comparison  of  calibrated 
and  uncalibrated  arrays  when  side  information  is  exploited. 

1.  INTRODUCTION:  MODEL 
Consider  the  additive  noise  linear  model 

xt  =  Hst  +  vt,  t  =!,•••  ,N,  (1) 


We  develop  CRBs  for  these  cases  using  the  constrained  CRB 
methodology  of  Gorman/Hero  and  Stoica/Ng  [3].  The  con¬ 
straints  arise  due  to  side  information  such  as  constant  mod¬ 
ulus  sources,  constraints  on  the  structure  and  elements  of 
H,  and  semi-blind  sources  (some  known  signal  values).  Ex¬ 
amples  are  given  comparing  calibrated  and  uncalibrated  ar¬ 
ray  CRBs.  A  space-time  coding  example  is  also  presented. 

2.  FIM  &  CONSTRAINED  CRBS 

Forming  the  IN  x  1  supervector  X  =  [xf ,  •  ■  ■  ,  X^]T,  then 
X  ~  CN(|JX, =  ff2lwxw),  where 

\lx=E[x]  =  [tf,---  ,|j£]T,  Mt  =  #St.  (2) 

Thus  we  have  a  multivariate  complex  normal  process  with 
deterministic  time- varying  mean  H St.  We  define  the  data 
matrix  and  the  columns  of  H  as 

S=  [Si , •  - •  ,Sjv]fcxJV,  H  =  [hi,  --  ,hfc].  (3) 

We  write  the  unknown  deterministic  parameters  in  a  real 
vector  of  length  2 Ik  +  2  kN,  given  by 


where  Xt  is  l  x  1  and  If  is  l  x  k.  The  elements  of  the  k  x  1 
signal  vector  will  be  denoted  by  St  =  [si(t),...  ,a*:(f)]T. 
We  use  the  notation  superscript  T,  *,  H  for  transpose,  con¬ 
jugate,  and  conjugate  transpose,  respectively,  with  complex 
numbers  denoted  c  =  c  +  jc.  The  noise  vt  is  assumed  com¬ 
plex  white  Gaussian,  with  variance  «r2.  The  model  (1)  un¬ 
derlies  many  array  processing  and  single-sensor  scenarios. 

In  the  narrow  band  calibrated  array  case  (l  sensors  and 
k  sources),  H  —  A(0)  ■  a  is  of  known  parametric  form  with 
respect  to  the  source  bearings.  Here  A(6)  is  the  array  man¬ 
ifold  matrix,  and  a  =  diag(<*i ,  •  ■  •  ,  a*,)  contains  complex 
constants  a<  that  model  the  channel  attenuation  for  the  ith 
source.  Constrained  bounds  are  developed  for  this  case  in 
[1,  2]. 

In  this  paper  we  are  interested  in  the  general  case  when 
H  is  unknown.  This  arises  in  the  uncalibrated  array  cases 
of  instantaneous  linear  mixing  and  convolutive  mixing,  and 
the  space-time  transmit  diversity  case  with  arrays  for  both 
transmission  and  reception.  An  uncalibrated  array  may 
have  unknown  sensor  placement,  phase  mis-matching,  and 
so  on.  In  such  cases  blind  methods  may  be  used  to  sepa¬ 
rate  and  estimate  source  waveforms  without  estimating  the 
source  bearings.  Performance  bounds  are  not  straightfor¬ 
ward  due  to  the  lack  of  regularity  in  the  Fisher  information 
matrix  (FIM)  associated  with  (1)  in  the  uncalibrated  case. 


e  =  [0H,0s]T, 


0H=[hW,...,hr,f£]T, 

0s  =  [5T,sT,--.,5£,s£]t.  (4) 


Note  that  er2  decouples  from  the  other  parameters,  and  so 
it  is  omitted. 

The  FIM  J  for  0  is  obtained  from 


dQi  dQj 

Partitioning  J  we  write, 


lJ(e)]«  =  J>R* 


(5) 


Jh  Jhs 
JsH  Js  ’ 


(6) 


with  elements  described  next.  Define  the  2k  x  2k  matrix 


Jo  = 


HhH  jHH  H  ‘ 
-jHHH  HhH  ' 


(7) 


then  Js  is  given  by  the  block-diagonal  2 kN  x  2k N  matrix 
2 

Js  =  -TRe{diag(Jo,  •  •  •  ,  Jo)},  (8) 

(7 


where  Jo  repeats  N  times. 
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Jff  may  be  written  as  follows, 


P  = 


* 

kxk 


[■P]mn  j[P\mn 

.  -m  mn  [^]  mn 


®  JlXl 


Jh  = 


r  b„ 


•But 

Bkk 


2lkx2lk 


(9) 

(10) 

(11) 


where  ®  denotes  Kronecker  product. 

Next  we  consider  the  cross-terms  in  the  FIM,  Jus  and 
Jsh-  It  can  be  shown  that 


L 


mn 


[S\*mn  j[S}*mn 

[S]m» 


(12) 


Jhs  = 


r  £n 


L-\n 

LkN 


—  Jsh ■  (13) 

2lkx2kN 


As  noted,  the  FIM  J  is  generally  not  invertible  because 
the  model  parameters  are  not  identifiable,  and  so  no  un¬ 
biased  estimator  for  0  exists.  However,  it  is  possible  to 
achieve  identifiability,  and  then  regularity  of  the  FIM,  by 
establishing  constraints  on  0.  We  establish  K  equality  con¬ 
straints  on  elements  of  0,  where  K  <  dim(0).  The  con¬ 
straints  have  the  form  /i(0)  =  0  for  i  =  1, . . .  ,  K.  Define  a 
K  x  1  constraint  vector  /(0),  and  a  corresponding  K  X  M 
gradient  matrix 


F(0)  =  ^e1  (14) 

with  elements  [F(0)]i,m  =  dfl(Q)/d[Q\rn.  The  gradient 
matrix  F(0)  is  assumed  to  have  full  row  rank  K  for  any 
0  satisfying  the  constraints  /i(0),...  ,/k(0).  Then,  the 
constrained  CRB  is  obtained  via  (Thrm.  1  of  [3]) 


E  [(0  -  0)T(0  -  0)]  >  U{UTJU)-'UT.  (15) 


J  is  the  unconstrained  FIM  from  (5),  and  U  is  an  ortho- 
normal  basis  for  the  null  space  of  F(0),  i.e.,  FU  =  0  and 
UTU  =  I.  Note  that  U  is  a  function  of  the  constraints  only. 

Examples  of  source  constraints  of  interest  include  con¬ 
stant  modulus  (CM)  sources,  known  source  cumulant  or 
kurtosis,  and  semi-blind  sources  (some  known  source  sam¬ 
ples).  Constraints  may  also  be  placed  on  H,  such  as  lim¬ 
iting  the  norm  of  H.  Together,  sufficient  constraints  may 
be  found  to  insure  information  regularity.  These  provide 
CRBs  on  symbol  estimation  in  blind  source  separation  sce¬ 
narios  that  exploit  source  features  such  as  CM.  We  may  also 
compare  bounds  on  source  estimation  for  both  calibrated 
and  uncalibrated  arrays  using  the  results  of  [1,  2],  where 
we  have  established  CRBs  on  bearing,  symbol,  and  channel 
estimation  for  calibrated  arrays  with  side  information. 


3.  EXAMPLES  IN  ARRAY  PROCESSING 

We  use  the  constrained  CRB  formulation  to  gain  insight 
into  the  following  questions. 


1.  Which  provides  more  accurate  signal  copy:  an  uncal¬ 
ibrated  array  (unknown  H  matrix  in  (1))  with  CM 
signals,  or  a  calibrated  array  (H  =  A(0)  •  a)  with 
unconstrained  signals? 

2.  Algorithms  for  blind  beamforming  with  uncalibrated 
arrays  often  exploit  independence  between  the  signals 
and  non-Gaussianity  as  characterized  by  the  kurtosis 
[4,  5,  6].  What  is  the  relative  value  of  these  con¬ 
straints  when  compared  with  the  CM  constraint  for 
CM  signals?  Do  the  CRBs  based  on  kurtosis  con¬ 
straints  imply  any  difference  in  separability  of  CM 
and  QAM  signals? 

We  generate  observations  Xi , . . .  ,  Xn  in  (1)  using  a  com¬ 
plex  narrowband  array  model  in  which  H  =  4(0)  •  a,  where 
A(6)  =  [a(#i),  •  •  ■  ,  a  (0(c)]  is  the  array  response  matrix,  0  = 
[0i , . . .  ,  0k]T  are  the  source  angles  of  arrival  (AOAs),  a(0,) 
is  the  array  manifold,  and  a  =  diag{r*i ,  •  ■  •  ,  q^}  is  a  di¬ 
agonal  complex  channel  gain  matrix.  We  consider  a  uni¬ 
form  linear  array  (ULA)  with  omnidirectional  sensors  and 
half-wavelength  spacing,  so  the  array  manifold  elements  are 
[a(0)]m  =  exp[j7r(m  —  l)sin0],  m  =  1, ...  ,1. 

Consider  a  particular  ULA  with  1  =  5  sensors  and  k  —  2 
sources  with  AOAs  0i  =  0°  and  02  varying  from  1°  to 
30°,  where  the  AOAs  are  measured  with  respect  to  the 
array  broadside.  The  noise  variance  is  a2  =  1,  and  the 
number  of  time  samples  is  AT  =  100.  The  complex  ampli¬ 
tudes  Qi  and  at  are  generated  with  phase  shifts  Zc*i  = 
f  and  Z«2  =  — f  rad.  The  amplitudes  |qi  |  and  [0:2 1 
are  chosen  to  achieve  a  desired  sample  SNR,  defined  as 
SNR,  =  |ai|2C2i  (i)/er2  where  the  sample  variance  of  sig¬ 
nal  i  is  C2i(i)  =  (1/N)5[]fli  lsi(*)|2-  SNRi  is  fixed  at 
10  dB,  while  SNR2  is  evaluated  at  5, 10,  and  15  dB.  One 
beamwidth  for  the  array  iB  23.6°  at  broadside. 

3.1.  Calibrated  vs.  uncalibrated  arrays 

The  constrained  CRB  for  a  calibrated  array  in  which  H  has 
the  structure  .4(0)  •  a  is  presented  elsewhere  [2].  Here  we 
compare  the  calibrated  array  CRBs  with  the  uncalibrated 
array  CRBs  outlined  in  the  previous  section  (5), (15).  The 
signal  vectors  Si,-"  ,Sjv  are  8-PSK  waveforms  with  unit 
modulus  I  Si  (t)  |  =  1  and  phase  rotation  such  that  Si  = 
[1, •••  ,  1]  -  For  the  case  of  unconstrained  mixing  matrix 
H,  it  is  known  [7]  that  the  CM  signal  constraint  and  the 
specified  phase  rotation  are  sufficient  to  uniquely  identify 
H  and  the  signal  phases  Zsj(t).  For  the  case  of  a  calibrated 
ULA,  it  is  well-known  that  the  AOAs  0  and  signals  s,(t)  are 
identifiable  with  no  signal  constraints  (“blind”  signals). 

Figure  1(a)  contains  the  mean  CRB  on  the  signal  phase 
parameters  Zsj(2), . . .  ,  Zsi(N)  for  sources  i  —  1,2  and  var¬ 
ious  constraints  on  the  structure  of  H  and  the  signals  St. 
Note  that  as  the  source  spacing  decreases  to  less  than  one 
beamwidth,  the  constraints  of  CM  signals  with  an  uncal¬ 
ibrated  array  (unknown  H)  potentially  provide  mote  ac¬ 
curacy  in  signal  phase  than  a  calibrated  array  with  blind 
signals.  Further,  the  o  and  x  symbols  are  coincident  on 
the  plots.  So  for  CM  signals,  a  calibrated  array  provides 
negligible  improvement  in  signal  phase  accuracy  compared 
with  an  uncalibrated  array  that  places  no  constraints  on 
H.  This  example  adds  further  testament  to  the  well-known 
power  of  the  CM  signal  constraint  for  signal  separation. 
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3.2.  Uncalibrated  array  and  moment  constraints 

The  following  constraints  on  the  signal  moments  are  com¬ 
mon  in  blind  beamforming  algorithms,  e.g.,  [4]-[6]: 

jjS  SH  =  known  matrix,  typically  I  (16) 

C2o(i)  =  jr  Si(tf  k110^)  i  =  1,  •  ■  •  ,  k  (17) 

v  t= i 
1  N 

fhAiii)  =  ^2  lSi(0|4  is  known,  i  =  1, . . .  , k.  (18) 

These  are  sample  moments  and  not  expectations.  Note  (16) 
expresses  that  the  signals  are  uncorrelated,  and  the  diago¬ 
nal  elements  of  (16)  constrain  the  signal  sample  variances 
(72i  (i)  =  1-  Then  (16)-(18)  imply  that  the  signal  sample 
kurtoses  C42(i)  =  m42  —  |C2o(i)|2  —  2C2i(i)2  are  known. 
We  will  refer  to  (16)-(18)  as  “moment  constraints,”  and  we 
further  assume  that  the  first  sample  of  each  source  signal  Si 
is  known  in  order  to  obtain  an  invertible  constrained  FIM. 
We  consider  two  types  of  signals:  both  source  signals  are 
8-PSK  (CM),  and  both  source  signals  are  64  QAM. 

Figures  l(b)-(d)  contain  constrained  CRBs  for  this  sce¬ 
nario.  For  the  CM  signals,  we  have  also  included  on  the 
plots  the  CRBs  based  on  the  CM  signal  constraints  |si(t)|  = 
1,  t  =  2, . . .  ,  N,  i  =  1, . . .  ,  k.  The  CM  signal  constraints 
are  exploited  by  some  blind  beamforming  algorithms,  e.g., 
ACMA  [7]. 

Figure  1(b)  contains  mean  CRBs  for  the  elements  of  the 
H  matrix.  In  the  bottom  panel  in  which  source  2  is  strong 
(SNR2  =  15  dB),  the  moment  constraints  and  the  CM  con¬ 
straints  yield  about  the  same  CRBs  for  most  values  of  62. 
In  difficult  scenarios  where  the  sources  become  very  closely 
spaced  (less  than  10°),  the  CM  signal  constraint  becomes 
more  informative  than  the  moment  constraints.  Similar  be¬ 
havior  is  exhibited  in  the  top  panel  of  Figure  1(b):  source  2 
is  weaker  (SNR2  =  5  dB),  so  the  CM  constraints  are  more 
informative  than  the  moment  constraints  over  a  larger  range 
of  AOA  spacings.  Note  also  that  if  only  moment  constraints 
are  used,  QAM  signals  provide  lower  CRBs  on  H  than  CM 
signals  for  this  case. 

Mean  CRBs  for  estimation  of  the  signals  S2, . . .  ,  Sat  are 
shown  in  Figures  1(c)  and  (d).  Source  2  is  weaker  in  Fig¬ 
ure  1(c)  than  in  Figure  1(d),  and  we  have  also  included  the 
CRBs  for  signal  estimation  when  the  H  matrix  is  known 
perfectly  (marked  with  boxes)  but  no  signal  constraints  are 
applied  (the  blind,  calibrated  case).  In  difficult  situations 
of  low  SNR  and  closely-spaced  sources,  exploiting  the  CM 
property  provides  the  potential  for  better  performance  com¬ 
pared  with  the  moment  constraints.  Note  that  the  CRBs  for 
signal  moment  constraints  and  unconstrained  H  are  approx¬ 
imately  equal  to  the  CRBs  for  known  mixing  matrix  H  and 
unconstrained  signals,  which  is  similar  to  our  observations 
about  calibrated  vs.  uncalibrated  arrays  in  Section  3.1. 

4.  SPACE-TIME  CODING 

Space-time  coding  employs  multiple  antennas  on  transmit 
and  receive  [8].  In  the  flat  fading  case  the  model  of  (1)  arises 
with  k  transmit  and  l  receive  antennas,  where  St  is  the  k  x  1 


code  vector  transmitted  by  the  k  antennas  at  time  t,  and 

[H] ij  is  the  complex  fading  channel  gain  from  the  jth  trans¬ 
mit  antenna  to  the  ith  receive  antenna.  The  independent 
Rayleigh  fading  model  corresponds  to  the  [H\ij  being  in¬ 
dependent,  complex,  Gaussian  random  variables  with  zero 
mean  and  unit  variance.  Suppose  that  the  signal  constel¬ 
lation  is  assumed  to  have  average  energy  equal  to  one,  and 
let  Ea  denote  the  total  energy  transmitted  from  all  fc  an¬ 
tennas  per  symbol.  Then  we  use  \jEsjk  ■  H  in  the  model 

(I) ,  yielding  an  average  SNR  per  receive  antenna  equal  to 
Ea/t r2  for  independent,  flat,  Rayleigh  fading  channels. 

The  model  (1)  assumes  that  the  fading  coefficients  [ffjy 
are  constant  over  the  block  of  N  symbol  times.  The  con¬ 
strained  CRBs  developed  in  this  paper  assume  that  H  ■  St 
in  (1)  is  deterministic,  so  constrained  CRBs  may  be  com¬ 
puted  for  a  particular  realization  of  the  fading  matrix  H. 
In  the  example  presented  next,  we  average  the  CRBs  from 
multiple  independent  realizations  of  H  to  investigate  the 
diversity  gain  that  results  from  various  constraints. 


4.1.  Constraints 

As  an  example,  consider  the  two-transmit  antenna  space- 
time  coding  scheme  in  [9].  The  code  in  [9]  for  k  =  2  trans¬ 
mitters  can  be  expressed  via  the  signal  constraints 

St+1=Ps*,  t  =  1,3,...  ,N  —  1(N  even)  (19) 

(20) 


where  P  = 


0  -1 

1  0 


so  a  total  of  two  complex  symbols  are  encoded  in  St  and 
St+i .  Sampling  at  the  symbol  rate  is  assumed,  and  this  en¬ 
coding  leads  to  a  simple  linear  receiver  structure  for  maxi¬ 
mum  likelihood  (ML)  symbol  detection.  The  ML  detector 
requires  knowledge  of  the  channel  matrix  H,  and  training 
samples  are  suggested  in  [9]  for  estimation  of  H.  We  investi¬ 
gate  bounds  on  estimation  of  the  signals  St  in  the  space-time 
coding  context  with  T  <  N  training  symbols  (semi- blind), 
the  code  (19),  and  other  constraints  including  CM  signals 
and  known  H  matrix. 

Suppose  that  the  first  T  symbols  Si , . . .  ,  St  transmit¬ 
ted  from  both  antennas  are  known,  and  assume  that  T  <  N 
with  T  and  N  even.  Then  the  gradient  matrix  (14)  corre¬ 
sponding  to  the  T  training  symbols  (semi-blind)  and  the 
space-time  code  (19)  for  samples  T  + 1, . . .  ,  N  has  the  form 


F,  = 

0fc(T  +  AT)x2!fc 

l2fcTx2fcT 

Fo 

Fo 

(21) 


where  Fo  repeats  (N  —  T)/2  times  and  equals 


Fo  = 


Ux4 


P  02x2 

02x2  —  P 


(22) 


The  constraints  characterized  by  (21)  will  be  denoted  ‘SEMI¬ 
BLIND  &  S-T  CODE’  in  the  example  below.  We  also  con¬ 
sider  other  combinations  of  constraints.  ‘SEMI-BLIND’  in¬ 
cludes  training  symbols  Si , . . .  ,  St  that  could  be  used  to 
jointly  estimate  H  and  the  unknown  signals  St+i,  ■  •  ■  , Sj v, 
but  the  space-time  code  is  not  exploited.  We  can  apply  the 
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Figure  1:  Source  1  bearing  is  fixed  at  =  0°,  source  2  bearing  is  varied  on  [l°,30°j.  (a)  Uncalibrated  vs.  calibrated  arrays: 
CRB  on  signal  phase  estimation  for  8-PSK  signals,  (b)  Mean  CRB  on  elements  of  H  matrix  for  8-PSK  (CM)  signals  and 
64-QAM  signals  for  various  constraints,  (c)-(d):  Mean  CRB  for  signals  with  (c)  SNRi  =  10  dB  and  SNR2  =  5  dB  and  (d) 
SNR2  =  15  dB. 
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constraint  that  the  N  —  T  unknown  signals  are  CM,  i.e., 
|si(t)|  =  1,  i  =  1 ,k,  t  =  T  +  1, . . .  , N.  We  can  also 
apply  the  constraint  of  known  H  matrix,  which  provides  a 
basis  for  evaluating  the  effectiveness  of  the  T  training  sym¬ 
bols  for  estimation  of  H. 

4.2.  Example 

Consider  an  example  with  k  =  2  transmit  antennas,  l  =  2 
receive  antennas,  independent  Rayleigh  fading,  and  N  =  50 
time  samples  with  T  =  2  training  symbols.  The  fading  is 
assumed  to  be  constant  over  the  block  of  N  symbol  times. 
The  SNR  per  receive  antenna  is  varied  over  the  range  0  to 
20  dB,  and  the  constrained  CRBs  are  averaged  over  500 
independent  fading  matrices  H  for  each  SNR  value.  The 
signals  are  8-PSK,  and  the  transmitted  signals  satisfy  the 
space-time  code  constraint  (19).  For  each  realization  of  H, 
we  compute  CRBs  on  the  signal  phases  ZSt+i,...  ,  Zs n 
subject  to  various  constraints,  and  these  CRBs  are  averaged 
to  obtain  mean  CRBs  for  the  realization. 

Figure  2  contains  constrained  CRBs  on  signal  phase 
estimation  for  various  constraints.  The  space-time  code 
structure  (19)  is  present  in  the  transmitted  signals,  but 
it  is  only  enforced  in  the  constraints  labeled  ‘S-T  CODE’. 
When  the  space-time  code  is  not  applied,  the  CRB  cor¬ 
responds  to  independent  estimation  of  the  transmitted  se¬ 
quences  si  (T  +  1), . . .  ,  si (N)  and  s2(T  +  1), . . .  ,  a2(N),  so 
diversity  gain  is  impossible.  We  make  the  following  obser¬ 
vations  from  Figure  2. 

•  Comparing  ‘KNOWN  H’  with  ‘KNOWN  H  &  S-T 
CODE’  shows  a  potential  diversity  gain  of  approx¬ 
imately  10  dB  in  SNR  provided  by  the  space-time 
code  when  H  is  known  exactly. 

•  Comparing  ‘SEMI-BLIND  &  S-T  CODE’  with 
‘KNOWN  H  &  S-T  CODE’  shows  that  T  =  2  training 
symbols  for  estimation  of  H  costs  approximately  3  dB 
in  SNR  compared  with  exact  knowledge  of  H. 

•  The  ‘SEMI-BLIND  &  CM  &  S-T  CODE’  curve  shows 
that  exploiting  CM  in  addition  to  the  training  and 
space-time  code  potentially  yields  about  1.5  dB  gain 
in  SNR. 

•  For  the  cases  in  which  the  space-time  code  constraint 
is  not  exploited,  the  ‘SEMI-BLIND  &  CM’  constraint 
provides  approximately  2  dB  gain  compared  with 
‘KNOWN  H’,  which  does  not  exploit  the  CM  prop¬ 
erty. 

Note  that  the  constrained  CRBs  on  ZSt  pertain  to  estima¬ 
tion  of  the  signals,  while  the  primary  quantity  of  interest 
in  digital  communication  is  probability  of  detection  error. 
Smaller  CRBs  suggest  the  potential  for  reduced  probability 
of  detection  error  in  practical  receivers. 
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ABSTRACT 

We  address  the  problem  of  estimating  Directions  Of  Arrival 
(DOA’s)  of  multiple  sources  observed  on  the  background 
of  nonuniform  white  noise  with  an  arbitrary  unknown  di¬ 
agonal  covariance  matrix.  A  new  deterministic  Maximum 
Likelihood  (ML)  DOA  estimator  is  derived.  Its  implemen¬ 
tation  is  based  on  an  iterative  procedure  which  includes 
stepwise  concentration  of  the  Log-Likelihood  (LL)  function 
with  respect  to  the  signal  and  noise  nuisance  parameters 
and  requires  only  a  few  iterations  to  converge. 

New  closed-form  expressions  for  the  deterministic  and 
stochastic  direction  estimation  Cramer-Rao  bounds  (CRB’s) 
are  derived  for  the  considered  nonuniform  model.  Our  ex¬ 
pressions  can  be  viewed  as  an  extension  of  the  well-known 
results  by  Stoica  and  Nehorai,  and  Weiss  and  Friedlander 
to  a  more  general  noise  model  than  the  commonly  used  uni¬ 
form  one.  Simulation  and  experimental  (seismic  data  pro¬ 
cessing)  results  illustrate  the  performance  of  the  estimator 
and  validate  our  theoretical  analysis. 

1.  INTRODUCTION 

ML  DOA  estimation  techniques  are  known  to  have  excellent 
asymptotic  and  threshold  performances  [1],  [2],  The  key 
assumption  used  for  the  derivation  of  both  the  determinis¬ 
tic  and  stochastic  ML  estimators  is  the  so-called  uniform 
white  noise  assumption  [1],  According  to  it,  sensor  noises 
are  presumed  to  form  a  zero-mean  Gaussian  process  with 
the  covariance  matrix  a2I,  where  cr2  is  the  unknown  noise 
variance,  and  I  is  the  identity  matrix.  This  simple  assump¬ 
tion  enables  to  concentrate  the  resulting  LL  function  with 
respect  to  both  signal  waveform  and  noise  nuisance  param¬ 
eters,  and,  therefore,  reduce  the  dimension  of  the  parameter 
space  and  the  associated  computational  burden  [1]. 

Apparently,  the  uniform  noise  assumption  may  be  un¬ 
realistic  in  certain  applications  [3]-[6],  where  the  noise  envi¬ 
ronment  remains  unknown  or  changes  slowly  with  time.  In 
the  general  case,  the  sensor  noise  should  be  considered  as 
an  unknown  colored  (i.e.  spatially  dependent)  process.  Re¬ 
cently,  several  advanced  ML  techniques  have  been  proposed 
which  exploit  the  ideas  of  colored  noise  modeling  [6]-[8]. 

In  some  practical  applications  (for  example,  when  the 
so-called  sparse  arrays  are  used),  the  general  colored  noise 
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assumption  can  be  simplified  by  assuming  the  sensor  noise 
to  be  spatially  white  [4],  [5].  In  this  case,  the  noise  spatial 
covariance  structure  still  can  be  represented  by  a  diagonal 
matrix  but  the  sensor  noise  variances  are  no  longer  identi¬ 
cal  one  to  another.  Such  a  noise  model  becomes  relevant 
in  situations  with  hardware  nonidealities  in  receiving  chan¬ 
nels  [9]  as  well  as  for  sparse  arrays  with  prevailing  external 
noise  (for  example,  reverberation  noise  in  sonar  or  external 
seismic  noise)  [4],  [5]. 

It  is  important  to  stress  that  if  sensor  noise  is  a  spa¬ 
tially  nonuniform  white  process,  neither  the  conventional 
“uniform”  ML  methods  [l]-[2],  nor  the  colored  noise  mod¬ 
eling  ML  techniques  [6]-[8]  may  be  expected  to  give  satisfac¬ 
tory  results,  because  the  former  methods  will  mismodel  the 
noise,  whereas  the  latter  techniques  will  ignore  important 
prior  knowledge  that  the  noise  process  is  spatially  white. 
This  appears  to  be  a  strong  motivation  to  develop  direc¬ 
tion  finding  techniques  for  the  nonuniform  white  noise  case. 
Moreover,  the  majority  of  the  ML  colored  noise  modeling 
based  approaches  developed  so  far  are  unable  to  concen¬ 
trate  the  LL  function  with  respect  to  the  noise  parameters 
[7],  As  a  result,  such  techniques  may  be  computationally 
demanding.  The  use  of  the  nonuniform  white  noise  model 
can  be  expected  to  overcome  this  drawback  by  means  of 
obtaining  “concentrated”  solutions  to  the  ML  estimation 
problem. 

The  motivation  given  shows  that  the  nonuniform  white 
noise  case  can  be  viewed  as  a  practically  important  gener¬ 
alization  of  the  simpler  uniform  model.  In  the  present  pa¬ 
per,  we  derive  a  new  iterative  deterministic  ML  estimator, 
which  concentrates  the  LL  function  with  respect  to  both 
signal  and  noise  nuisance  parameters.  Unlike  the  analytic 
concentration  used  in  the  conventional  “uniform”  ML  esti¬ 
mators,  the  concentration  of  the  LL  function  in  the  nonuni¬ 
form  noise  case  will  be  performed  in  a  numerical  (iterative) 
manner,  with  only  a  few  iterations  necessary  for  conver¬ 
gence. 

Furthermore,  we  derive  closed- form  expressions  for  the 
deterministic  and  stochastic  direction  estimation  CRB’s  for 
the  considered  nonuniform  white  noise  case.  These  expres¬ 
sions  can  be  viewed  as  a  natural  extension  of  the  well-known 
results  reported  in  [l]-[2]  and  [10]  for  the  uniform  noise 
model.  The  estimation  performance  of  the  proposed  ML 
technique  is  compared  to  the  derived  CRB’s  and  the  per¬ 
formance  of  the  deterministic  uniform  ML  estimator  [1]  via 
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computer  simulations.  Moreover,  we  test  both  the  uniform 
and  nonuniform  ML  techniques  using  experimental  seismic 
data  recorded  by  the  GERESS  array  (Germany).  Our  sim¬ 
ulations  and  the  results  of  real  data  processing  demonstrate 
essential  performance  improvements  achieved  by  means  of 
the  proposed  nonuniform  ML  estimator  relative  to  the  con¬ 
ventional  uniform  ML  algorithm.  Additionally,  the  exper¬ 
imental  results  provide  a  solid  verification  of  the  practical 
relevance  of  the  considered  nonuniform  noise  model. 

2.  SIGNAL  MODEL 

Let  an  array  of  n  sensors  receive  q  (q  <  n)  narrowband 
signals  impinging  from  the  sources  with  unknown  DOA’s 
01, ...  ,8q.  The  ith  snapshot  vector  of  sensor  array  outputs 
can  be  modeled  as  [l]-[3] 

x(i)  =  A(6)a(i)  +  n(i)  ,  i  =  l,...,N  (1) 

where  A(9)  —  [a(0i), . . .  ,  a(09)]  is  the  n  x  q  matrix  com¬ 
posed  from  the  signal  direction  vectors  a(6t)  (i  =  1, . . .  ,q), 
0  =  [0i, . . . ,  09]T  is  the  5x1  vector  of  the  unknown  signal 
DOA’s,  s(i)  is  the  q  x  1  vector  of  the  source  waveforms,  n(i) 
is  the  nxl  vector  of  white  sensor  noise,  N  is  the  number 
of  snapshots,  and  (-)T  stands  for  the  transpose.  In  a  more 
compact  notation,  (1)  can  be  rewritten  as 


where  ^  =  [0T,  crT ,  sT(l), . . . ,  sT(N)]T  is  the  vector  of 
unknown  signal  and  noise  parameters,  cr  =  [af, . . . ,  tr2]T, 
x(i)  =  Q~1/2x(i),  and  A(9)  =  Q~1^2  A(6). 

Introduce  the  n  x  N  matrix 

G  =  X  -  A(9)S  =  [ci, . . .  ,cN]  =  [ri, . . .  ,rn]T  (8) 

where  Ci  and  ri  me  the  nxl  and  Nxl  vectors  corre¬ 
sponding  to  the  ith  column  and  the  ith  row  of  the  matrix 
G,  respectively.  With  these  notations,  from  (7)  it  follows 
that 

=  *l  -Nlj  Q-Xek  (9) 

where  e*  is  the  vector  containing  one  in  the  fcth  position 
and  zeros  elsewhere. 

Prom  (3)  and  (9),  we  obtain  that  if  the  other  parameters 
are  fixed,  the  ML  estimate  of  the  diagonal  noise  covariance 
matrix  is  given  by 

Q  =  ■^diag{rfri,r2/r2,...,r"r„}  (10) 

Here,  we  exploit  the  following  obvious  property  [ C]k,k  = 
rk  of  the  matrix 


X  =  A(0)S  +  N  (2) 

where  X  =  [x(l), . . . ,  x(N)]  is  the  nxN  array  data  matrix, 
S  =  [s(l), . . . ,  s(N)}  is  the  q  x  N  source  waveform  matrix, 
and  N  =  [n(l), . . . ,  n(N)]  is  the  nxN  sensor  noise  matrix. 
The  sensor  noise  is  assumed  to  be  a  zero-mean  spatially 
and  temporally  white  Gaussian  process  with  the  unknown 
diagonal  covariance  matrix 

Q  =  E{n{t)nH(t)}  =  diag  {(?!,(?%, .  ■  ■  ,<r*}  (3) 

In  what  follows,  the  signal  waveforms  will  be  assumed  to 
be  either  deterministic  unknown  processes  [1],  or  random 
zero- mean  Gaussian  processes  [2],  In  particular,  the  signal 
snapshots  are  assumed  to  satisfy  the  following  models 

x(i)  ~  M(As{i),Q)  (4) 

x(i)  ~  Af(0,  R )  (5) 

in  the  deterministic  and  stochastic  case,  respectively.  Here, 

R  =  E{x(i)xH  (*)}  =  APAh  +  Q  (6) 

is  the  array  covariance  matrix,  P  =  E{s(i)sw(i)}  is  the 
source  waveform  covariance  matrix,  Af  denotes  the  complex 
Gaussian  distribution,  and  (-)H  stands  for  the  Hermitian 
transpose. 

3.  MAXIMUM  LIKELIHOOD  ESTIMATION 

Under  the  assumption  that  the  signal  waveforms  are  deter¬ 
ministic  unknown  sequences,  the  LL  function  for  the  model 
considered  is  given  by  [11] 

n  N 

L{V)  =  -N^logo-fc  -  ||*(»)  -  A(0)s(i)||2  (7) 

k= 1  i=  1 


N 

C  =  ^CiC?  (11) 

i= 1 

Inserting  (10)  into  (7),  we  have 
L(0,S)  -  -AT^Sogjirfr*}  -  (12) 

fc=i  t=i 


Using  (10)-(11)  and  the  properties  of  the  trace  operator,  we 
obtain  that 


£cfQ  1cis 


k= 1 


trace  |  Q  j 

k= 1 

trace  |Q_1c|  =nN  (13) 


Hence,  after  omitting  the  constant  term  (13),  the  LL  func¬ 
tion  (12)  can  be  further  simplified  to 


L(0,S)  =  -NY^log{jjrZrk}  (14) 

fc=i 


At  the  same  time,  from  (7)  we  obtain  in  a  standard  way  that 
if  the  remaining  parameters  are  fixed,  the  ML  estimate  of 
the  matrix  S  is  given  by 

S=  (AH(0)A(0))  1AH(0)X  (15) 

where  X  =  Q~1/2X  is  the  nx  N  transformed  data  matrix. 
Note  that  the  estimate  (15)  depends  on  Q,  and,  in  turn,  the 
estimate  of  Q  in  (10)  depends  on  S.  Therefore,  it  appears 
to  be  impossible  to  obtain  any  closed  form  expression  of 
the  LL  function  concentrated  with  respect  to  the  full  set 
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of  the  signal  and  noise  nuisance  parameters.  To  avoid  this 
difficulty,  we  introduce  the  idea  of  stepwise  concentration, 
which  was  also  exploited  in  [3]  in  an  implicit  form.  The 
essence  of  this  idea  is  to  concentrate  the  LL  function  in  an 
iterative  manner. 

Omitting  the  constant  factor  —N  in  (14)  and  inserting 
(15)  into  this  equation,  we  obtain  the  following  alternative 
expressions  for  the  negative  LL  function 

n 

£(0)  =  5>{^rfr*} 

fc= 1 

=  trace  log  j  ~  GGH  j 
=  trace  log  {ip^(0)XXHP^(0)} 

=  trace  log  |Pj^(0)  pj  (16) 

AH  (0)A(6)y1  AH  (8)  and 
Pj^(0)  =  I  —  P are  the  projection  matrices.  Here, 

R  =  ±XXH  (17) 

is  the  n  x  n  sample  covariance  matrix  of  the  transformed 
data. 

It  is  important  to  stress  that  in  the  particular  uniform 
noise  case  ( Q  =  cr2I),  the  function  (16)  can  be  simplified 
to 

C(9)  =  trace  log  {P^(0)  P}  (18) 

where 


•  Step  1.  Set  Q  =  I. 

•  Step  2.  Find  the  estimate  of  6  as  9  — 

argming  {£(©)}  where  the  negative  LL  function 
C(0)  is  defined  by  (16). 

•  Step  3.  Using  the  so-obtained  0,  compute  S  from 
(15).  Find  the  refined  estimate  of  Q  from  (10) 
using  (8)  and  the  previously  obtained  (fixed)  S 
and  0.  Repeat  steps  2  and  3  a  few  times  to  obtain 
the  final  estimate  of  9. 


In  step  1,  the  algorithm  is  initialized  using  the  uniform 
noise  assumption.  Under  this  assumption,  the  estimate  of 
Q  should  be  written  a s  Q  =  a2 1 ,  where  a1  is  some  estimate 
of  the  noise  variance  a2 .  However,  from  the  structure  of  the 
negative  LL  function  (16)  it  follows  that  the  minimizer  of 
this  function  does  not  depend  on  the  value  of  a2.  Therefore, 
without  loss  of  generality  in  step  1  we  can  set  a2  —  1. 

4.  CRAMER- RAO  BOUNDS 

The  following  two  theorems  present  closed-form  expressions 
for  the  deterministic  and  stochastic  CRB’s  under  the  nonuni¬ 
form  noise  assumption. 

Theorem  1:  The  qxq  deterministic  CRB  matrix  for  the 
signal  DOA’s  is  given  by: 

CRBdet00  =  ^  {Re  [(bHp\b)  ©  PT]  p  (21) 

where  A  =  Q~1/2A,  D  =  Q~1/2D,  P  =  j?  *(*)*(*)" » 

O  stands  for  the  Schur-Haramard  matrix  product,  and 


where  P^(0)  =  A(0)  f 


R=±XXh  (19) 

is  the  sample  covariance  matrix  of  the  original  data  (1). 
Interestingly,  this  function  is  not  equivalent  to  the  conven¬ 
tional  negative  LL  function  [1] 
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Proof:  See  [11]. 

Theorem  2:  The  qxq  stochastic  CRB  matrix  for  the 
signal  DOA’s  is  given  by: 


C{0)  =  trace  {P\{0)  P}  (20) 

derived  under  the  uniform  noise  assumption.  The  explana¬ 
tion  of  this  fact  lies  on  the  basis  of  the  observation  that  the 
ML  estimators  (16)  and  (20)  use  very  different  types  of  a 
priori  information  on  the  structure  of  the  noise  covariance 
matrix. 

Another  important  observation  is  that  unlike  (20),  the 
function  (16)  does  not  enable  simultaneous  concentration 
with  respect  to  both  signal  and  noise  nuisance  parameters. 
This  fact  can  be  explained  by  inspecting  the  structure  of 
(16).  According  to  this  equation,  the  estimate  of  the  signal 
DOA  vector  9  depends  on  the  estimate  (10)  of  the  matrix  Q, 
which,  in  turn,  is  dependent  of  the  estimate  of  0.  To  over¬ 
come  this  problem,  instead  of  the  analytic  concentration  ap¬ 
proach  used  for  the  derivation  of  the  uniform  ML  estimator, 
we  propose  the  so-called  stepwise  numerical  concentration, 
which  is  given  by  the  following  iterative  procedure: 


CRBSTO00  =  l{2Re[(pAWR-,Ap) 

©  (b^p^R-'by  |  -mtmt 

where  R  =  Q-1,2RQ-  ' 1 ,/?  and  the  real  matrices 

2Re  j(iT1Ap)T©  (i>"P^)J  ,  (24) 

{(R-pOR-1 

-(P^P-1)*©(P^P-1)}“1  (25) 

Proof:  See  [11], 

It  is  interesting  to  compare  the  derived  expressions  with 
the  deterministic  and  stochastic  CRB’s  in  the  uniform  noise 


M  = 
T  = 
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case.  The  latter  two  bounds  are  given  by  [1],  [2],  [10] 
CRBdet(,0  =  ^{Re[(X?ffP^)0PT]}_1(26) 
CRBsto00  =  ^{Be[(PAHR-1AP) 

©(z?f/p^ir1r>)T]}  1  (27) 

respectively. 

The  comparison  of  (21)  and  (26)  shows  that  the  nonuni¬ 
form  deterministic  bound  (21)  corresponds  to  the  uniform 
CRB  (26),  with  the  only  difference  that  the  nonuniform 
CRB  uses  the  transformed  array  manifold  A  instead  of  the 
original  manifold  A.  This  transformation  can  be  viewed  as 
a  sort  of  preequalization  of  sensor  noise1 .  To  explain  the  ef¬ 
fect  of  noise  preequalization,  let  us  consider  the  case  when 
some  part  of  array  sensors  suffers  from  intensive  noises, 
whereas  another  part  of  sensors  remains  relatively  “noise¬ 
less”  .  According  to  the  above-mentioned  manifold  transfor¬ 
mation,  the  contribution  of  the  noisy  sensors  to  the  CRB 
(21)  will  be  negligible  because  of  relatively  low  weights  as¬ 
signed  to  these  sensors.  This  corresponds  to  our  natural 
expectation  that  the  optimal  (ML)  algorithm  derived  for 
the  nonuniform  model  should  be  insensitive  to  the  pres¬ 
ence  of  such  noisy  sensors.  Such  a  robustness  property  is 
achieved  by  means  of  blocking  the  outputs  of  corresponding 
(noisy)  array  channels  and  exploiting  only  noiseless  sensors. 
Prom  this  point  of  view,  the  manifold  transformation  ma¬ 
trix  Q~i/2  can  be  identified  as  a  sort  of  blocking  matrix. 

As  it  can  be  seen  from  the  comparison  of  (23)  and  (27), 
in  the  stochastic  case  the  relationship  between  the  uniform 
and  nonuniform  bounds  becomes  more  complicated  than 
in  the  deterministic  case.  In  particular,  this  relationship 
cannot  be  described  solely  in  terms  of  the  manifold  trans¬ 
formation  Q  -1/2.  We  observe  that  the  bound  (23)  contains 
an  additional  term  -MTMt  which  does  not  appear  in 
(27).  In  the  general  case,  we  obtain  that 

Nonuniform  CRBDETg^  =  Uniform  CRBDETgg 

Q=<r^I 


UNCOBRELATED  SOURCES 


10’  10!  10s 


NUMBER  OF  SNAPSHOTS 

Figure  1:  Comparison  of  the  DOA  estimation  RMSE’s  and 
CRB’s.  First  example. 


where  B  —  diag{(w/c)di  cos#i,  (w/c)d2  cos#i, . . . ,  (u/c)dn 
cos#i),  p  =  jf  | -s ( i )  | 2 ,  dk  is  the  coordinate  of  the  fc-th 

sensor,  uj  is  the  central  frequency,  and  c  is  the  propagation 
speed. 

Assuming  that  the  array  has  omnidirectional  sensors, 
the  number  of  snapshots  is  high  (p  ~  p),  and  defining 
the  SNR  as  [5]  SNR  =  (pln)aHQ~la  =  ( p/n )  £"=1 1  /ah 
we  obtain  the  following  explicit  relationship  between  the 
stochastic  and  deterministic  single-source  bounds: 

CRBstc>0@  —  ^1  +  rt  snh  )  CRBdet00  (28) 

Hence,  in  the  large  sample  case  the  difference  between  the 
two  bounds  becomes  small  when  the  source  is  powerful 
enough,  so  that  nSNR  »  1. 

5.  SIMULATIONS 


Nonuniform  CRBgTO  qq 


>  Uniform  CRB 


STO  09 


The  proof  of  the  last  equation  is  given  in  [11]. 

Assume  that  there  is  only  one  signal  source  ( q  —  1). 
In  this  case,  we  have  that  A  =  a  and  D  =  d,  where  a  — 
Q-^a.  Therefore,  the  array  covariance  matrix  (6)  can 
be  rewritten  as  R  =  paaH  +  Q,  where  p  =  E  {|s(i)|2}  is 
the  signal  variance.  It  is  easy  to  show  that  in  this  case  the 
bounds  (21)  and  (23)  can  be  simplified  to  [5] 


CRBdet00 


aH Q  la 

2  Np[aH  Q~laaH  B2Q~1a  —  (aHBQ_1o)2] 


rRR  = _ 1  +  paH Q  ]a. _ 

STOee  2Np2 [a11  Q~1aaH B2Q~1a  -  (a^BQ-'a)2] 


1  Usually,  the  term  prewhitening  is  used  but  this  is  somewhat 
confusing  to  use  it  here  because  sensor  noise  has  been  originally 
assumed  to  be  spatially  white. 


We  assumed  a  ULA  of  ten  sensors  spaced  half  a  wave¬ 
length  apart,  and  two  equally  powered  sources  with  the 
DOA’s  9\  =  7°  and  62  =  13°.  The  nonuniform  noise  was 
assumed  to  have  the  following  covariance  matrix:  Q  = 
diag{10.0,  2.0, 1.5, 0.5, 8.0, 0.7, 1.1,  3.0, 6.0,  3.0).  In  all  our 
examples,  the  experimental  DOA  estimation  Root-Mean- 
Square  Errors  (RMSE’s)  of  the  conventional  uniform  and 
the  proposed  nonuniform  ML  methods  have  been  compared 
to  the  nonuniform  CRB’s  (21)  and  (23). 

In  the  first  example,  we  assume  two  uncorrelated  sources 
with  the  SNR  =  10  dB.  Fig.  1  displays  the  results  versus 
the  number  of  snapshots.  In  the  second  example,  two  cor¬ 
related  sources  are  assumed,  with  the  correlation  coefficient 
equal  to  0.9.  The  SNR  =  15  dB  is  taken  and  the  results  are 
plotted  in  Fig.  2  versus  the  number  of  snapshots. 

From  Figs.  1-2,  we  observe  that  uniform  ML  performs 
poorly  in  the  nonuniform  noise  case.  As  expected,  the  pro¬ 
posed  nonuniform  technique  provides  essential  performance 
improvements.  In  particular,  it  attains  the  stochastic  CRB 
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CORRELATED  SOURCES 


Figure  2:  Comparison  of  the  DOA  estimation  RMSE’s  and 
CRB’s.  Second  example. 


(23)  even  at  small  sample  sizes.  Since  two  iterations  are 
enough  to  guarantee  the  convergence,  the  computational 
cost  of  our  technique  is  comparable  to  that  of  conventional 
ML. 

6.  EXPERIMENTAL  RESULTS 

To  validate  the  practical  relevance  of  the  nonuniform  noise 
model,  real  seismic  data  were  used.  These  data  were  col¬ 
lected  by  GERESS  array  (Germany).  The  data  record  of 
the  regional  seismic  event  at  an  azimuth  of  0  =  121.8°  was 
analyzed  (see  [12]  for  details).  Note  that  the  azimuth  value 
of  this  event  was  known  in  advance  with  a  high  precision. 
Estimating  this  parameter  using  the  methods  tested,  we 
were  able  to  compare  their  experimental  performances. 

The  conventional  and  proposed  ML  methods  have  been 
applied  to  azimuth-velocity  (2D)  estimation  at  the  follow¬ 
ing  four  frequencies:  fi  =  0.9375  Hz,  /2  =  1.25  Hz,  f3  = 
1.5625  Hz,  and  /4  =  1.875  Hz. 

The  experimental  azimuth  estimates  have  been  used  to 
compute  the  experimental  RMSE’s  shown  in  Fig.  3.  From 
this  figure,  it  is  clearly  seen  that  nonuniform  ML  has  notice¬ 
ably  better  experimental  performance  than  the  uniform  ML 
technique.  These  results  provide  a  solid  verification  of  rele¬ 
vance  of  the  developed  nonuniform  noise  model  in  practical 
applications. 
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ABSTRACT 

The  optimal  detection/estimation  algorithms  require  large 
computing  expenditures  in  the  radar,  sonar  and  etc.  The 
paper  presents  the  new  Uniformly  Most  Powerful  Test  for 
matched  detecting  of  the  symmetrical  signal  subspace.  The 
general  (logical)  shift  operators  group  is  used  for 
describing  of  the  symmetry.  This  algorithm  may  be  used 
to  reduce  the  complexity  of  matched  detector  for  unknown 
signal  subspace  and  for  a  signal  processing  in  real  time. 
The  reduction  brings  appreciable  hardware  gains  and  a 
small  performance  penalties  in  some  radar  systems.  The 
signal  subspace  model  for  moving-target  indication  in 
radar  is  considered.  We  used  the  new  approach  for 
creation  of  the  sub-optimal  detector  with  minimal 
computing  expenditures. 

1.  INTRODUCTION 


subspace  <H).  Here  PH  is  the  projection  x  on  subspace  (H): 

Ph=H(HtH)'Ht.  (4) 

The  statistic  y1  is  a  quadratic  form  in  the  normal  random  vector 
x:  N[|iH0,  C^Ph].  It  is  known  that  ^/o2  is  chi-squared  distributed 
with  noncentrality  parameter  (p2/c^)Es>  Es  =  0THTH0:  j^/o2: 
X2P(|x2E s/o2). 

The  chi-squared  distribution  has  a  monotone  likelihood  ratio. 
Therefore  by  the  Karin-Rubin  theorem,  the  test 
l.X2/oSa2o 

•Kx'/o2)  =  {  (5) 

o,  yW<y\ 

is  the  Uniform  Most  Powerful  (UMP)  invariant  detector  for 
testing  Ho:  p=0  versus  Hi:  p>0  in  the  measurement  x:  N[pH0, 
o2!].  Further  we  will  consider  a  subspace  (H)  as  a  symmetrical  to 
the  group  of  generalized  (logical)  shift  transformations.  Further 
we  establish  that  statistic  (3)  is  also  maximal  invariant  to  the 
group  transformation  of  general  shift  for  symmetrical  signal 
subspace  (H). 


In  signal  detection  problems,  we  assume  that  each 
measurement  is  a  sum  of  a  signal  component  and  a  noise 
component:  x„  =  ps„  +  own  ;  n=0,l, ...,  N-l. 

The  measurements  are  organized  into  a  N-dimensional 
measurement  vector  x=  ps  +  aw;  ( 1 ) 

where  vector  ps  contains  samples  of  the  signal  to  be  detected 
and  the  vector  aw  contains  samples  of  the  added  noises.  We 
assume  that  the  noise  vector  w  is  draw  from  a  multivariate 
normal  distribution  w:  N[0,I].  This  means  that  the  measurement 
x  is  drawn  from  a  multivariate  normal  distribution  x:  N[ps,  a2!]. 
In  some  systems  it  sometimes  happens  that  the  signal  s  in  the 
measurement  model  x:  N[ps,  a2!]  is  a  linear  combination  of 
modes  or  basis  vectors,  in  which  case  it  may  be  represented  as 
N- 1 

s=^enhn=m.  Here  H  is  a  known  N  X  N  matrix  with 

71—0 

columns  h„  and  0  is  a  unknown  NX  1  vector  with  elements  0n: 

A 

s=[hoht ...  hfj-i]  :  .  (2) 

&N- 1 . 

Let  the  mode  matrix  H  is  known  but  the  mode  weights  are 
unknown.  In  this  case,  the  signal  is  known  to  lie  in  the  linear 
subspace  (H)  spanned  by  the  columns  of  H,  but  its  exact  location 
is  unknown  because  0  is  unknown.  We  would  like  to  test  Ho-'  p=0 
versus  Hi:  p>0  when  x  is  distributed  as  N[pH0,  a1!]  and  0  is 
unknown.  It  is  known  [1]  that  the  statistic  y2  =  xTPHx  (3) 
is  a  maximal  invariant  to  the  group  of  transformations  that  adds 
a  bias  from  the  orthogonal  subspace  (A)  and  rotates  in  the 


2.DESCRIPTION  OF  THE  SYMMETRICAL 
SUBSPACE 

The  operation  t©x  is  called  generalized  (logical)  shift  in  an 

71 

argument  t  on  a  value  x,  where  x,te  [0,N-1  ] ,  t  =  ^  f  p  m  P_1  , 

p= 1 

T  =  ZTPmP~'  =ZcPm/”1  ’ 

p= l  p= l  p= l 

Cp=((tp+xp))m  -  residue  (mod  m)  and  Cp,tp,Tps  [0,m-l],  N=m".  Let 
g(h)  denote  the  operator  of  a  generalized  shift  [2],  We  represent  a 
discrete  mode  of  a  signal  as  a  column  vector  h=(hn  h] . . .  hn.i)T. 
The  generalized  shift  operation  can  be  represented  as 
permutation  of  coordinates  of  this  vector.  It  is  possible  to 
represent  the  operators  of  generalized  shift  by  block  cyclic 
matrixes  of  permutations.  The  matrix  gje  G  is  a  matrix  of 
permutation,  therefore  one  unit  is  equal  to  each  of  its  rows  and  in 
each  column  there  is  only  a  singular  1,  all  of  the  remaining 
numbers  are  zero.  Let  (H  )  be  a  symmetrical  subspace.  Then  hj  = 
gihk,  i=l©k;  i,l,ke  [0.N-1  ].  Therefore  symmetrical  matrix  H 
may  be  written  as 

»=[  h  gA  -  (6) 

The  subspace  (H)  is  called  symmetrical,  if  transformed  mode 
h;6  H  by  group  G  also  belongs  to  subspace  (H),  but  mode  has 
another  value  of  the  parameter  i: :  gh,  =  hreH,  i,re  [0,N-1], 

Note,  that  g  is  orthogonal  matrix:  ggT=I.  We  have  the  following 
representation  for  the  operator  g:  gj  =  VHWiV, 
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where  V  =  NT1/2  [Had(t,t)] ,  Had(t,x)  =  exp[j2n/m  ^'tiXi  ]„ 

i=l 

j=V=T,  W,  =  diag[Had(i,T)],  VHV=WH=I.  We  simplify  our 
notation  by  written  (VT)*  as  VH,  where  T  is  sign  of  transposition, 
*  -  sign  of  the  complex  conjugate.  Eigenvector  of  generalized 
shift  operators  are  the  full  orthonormalized  systems  of 
Hadamard-Chrestenson  functions: 

n  n 

Had(p,t)=exp[j2jt/m^pIfI-  ],  p  =  '^pim,~l  , 

!=1  i=l 

n 

•  (?) 

1=1 

At  m=2  they  are  called  Walsh  functions,  at  m=N  they  are  called 
discrete  exponential  functions. 

The  matrix  H  is  block  cyrculant  matrix  and  may  be  written  as 
H  =  VhAV  =  gHg,  foranygeG,  (8) 

then  Ph=H(HtH)  Ht=  VHflV,  (9)  where 

A  =  diag(Ao  ...  XN.(),  Xj  -  eigenvalue  of  matrix  H , 

£2  =  diag(Eoei ...  En_i),  £i-  eigenvalue  of  matrix  PH.  The  terms  of 
the  diagonal  matrix  A  are  a  Hadamard-Chrestenson 
Transformation  of  the  fist  column  h  of  matrix  H.  Similarly  the 
terms  of  the  diagonal  matrix  £2  are  a  Hadamard-Chrestenson 
Transformation  of  the  fist  column  of  matrix  PH. 

2.  NEW  DETECTION  ALGORITHM  FOR 
SYMMETRICAL  SIGNAL  SUBSPACE 


The  sufficient  statistic  for  the  parameter  p  is  (3) 

X2  =  xTPHx. 

The  operator  PH  is  block  cyrculant  matrix  and  it  may  be  written 
as 

P„  =  H(HTH)lHT=gPHg.  (10) 

Now  we  will  establish  that  statistic  (3)  is  a  maximal  invariant  to 
the  group  transformation  of  general  shift  G  =  {g:  g(x)  =  gx  = 
VHWVx  }  under  condition  (6,8,9).  It  is  clear  that: 

1-  (gx)TPHgx  =  xTPHx.  (11) 

2.  (x,)tPhx,  =  (x2)  tPhx2=Kx,)t  Vh£2Vx,  =  (x2)TVH£2Vx2 
=>(Xi)Tt2Xi  =  (X2)t£2X2=>  1 1  X,£21/21 1 2  =  1 1  X2£21/2i  | 2  =>  x,  =  gx2> 
(12) 

where  Xi  =  Vxi  and  X2  =  Vx2  is  a  Hadamard-Chrestenson 
transformation  of  x,  a  sign  1 1  1 1  is  Euclidean  norm.  The  maximal 
invariant  may  be  written  as 
N- 1 

wp=  (1/V77 )'£hj  Had*(p,i)  (13) 

<=0 

The  statistic  (3)  requires  N2  multiplication  operations  and  N2 
addition  operations.  The  new  statistic  (1 1)  requires  N 
multiplication  operations  and  N2  addition  operations  for  m=2.  In 
this  case  it  is  used  Walsh  Transformation  instead  of  Hadamard- 


Chrestenson  transformation. 

The  statistic  (3)  has  not  performance  penalties  if  the  signal 
subspace  is  symmetrical. 

But  exact  symmetry  in  signal  subspace  exists  seldom  for 
real  signal  model.  Let  consider  N  the  continuous  time 
cosinusoids  of  the  form  Aicos((0jt  +  tpj)  are  summed  to 
produce  the  signal  s(t).  If  this  signal  is  sampled  at  the 
sampling  instants  t=nT,  then  the  discrete  time  signal  is: 
w-i 

sn=  7,  Aj  cosjCQjTn  +  <Pj) . 

i=0 

Typically,  such  samples  are  taken  over  an  interval  [0<t<NT]  to 
produce  the  samples  vector  s  =  [s0  s* ...  sn.j]t.  The 
vector  of  samples  s  may  be  written  as  s  =  Re  H0, 
where H  =  [ho  h,  . . .  h,,.,], 0=  [0o0i  ...  0N.i]T, 
hj  =  [1  exp(jt0jT  ...  exp(j(0jT(N-l))  ]T,  0;  =  A,exp(j  <)>;), 
cc»i=  CDoi.  We  assume  that  s  is  an  N-vector  that  is  constructed  from 
a  linear  combination  of  linearly  independent  cosines  and  sines, 
provided  T=l,  coj  =  (27t/N)i,  ie[0,N-l].  The  mode  hj  is  a 
complex  exponential  mode  and  HHh=NI.  The  algorithm  (11) 
consists  of  two  parts:  coherent  detector 
N-l  N-l 

yk=  (1/V^7 )7[wp  (l/V^V  )YjX.  Had*(p,i)]Had(k,p) }  (14) 

p= 0  i=0 


N-l 

and  energy  detector  x2  =  7  (yk  )2 
*= o 


written  as 


.  The  test  (3)  may  be 


X2  =  xtPhx  =  xtH(HtH)'Htx 


(15) 


where  t  =  HTx,  e  =  1 1  hi  1 2  (16) 

The  known  algorithm  (15)  and  obtained  algorithm  (11)  have 
difference  in  their  coherent  detector  (14)  and  (16).  We  compare 
signal-to-noise  ratio  (SNR,)  for  test  (14)  and  (SNR2)  for  test  (16) 
for  each  mode  of  H.  Let  Zk  =  [(SNR)1]k/[(SNR)2]|t  denote  factor 
of  noise  immunity  loss.  [(SNR),]k  may  be  written  as  [(SNRhL  = 


Wk 

- ,  and  [(SNR)2]k  may  be  written  as  [(SNR)2]k  =  pN/o.  Then 

<7 


factor  of  noise  immunity  loss  Zk  =  pyk/pN.  (17) 

It  is  plotted  for  N=64,  M=2,  (no  =1  in  Figure  1.  This  curve  may 
be  used  to  compute  the  effective  loss  in  SNR  that  results  from 
not  existing  exactly  symmetry  in  a  subspace  (H)  to  the  dyadic 
shift  group. 

This  implementation  of  coherent  detector  has  N  operations  of 
multiplication  and  N2  operations  of  addition.  The  structure  of 
implementation  for  known  test  t  =  HTx  consists  of  N  branches. 
Each  branch  is  a  correlator  of  transformed  data  with  stored 
modes.  Therefore  known  test  structure  has  N2  operations  of 
multiplication  and  N  operations  of  additions.  The  advantage  of 
new  algorithm  is  obvious.  The  implementation  is  hardware- 
efficient,  but  it  is  sub-optimum. 

The  accuracy  of  the  symmetry  in  subspace  (H)  defines  the  noise 
immunity  of  this  algorithm. 
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Figure  2  Implementation  of  the  coherent  detector 
for  symmetrical  signal  subspace 

The  accuracy  of  the  symmetry  in  subspace  (H)  defines  the  noise 
immunity  of  this  algorithm.  In  our  case  the  noise  immunity  losses 
smaller  than  3  dB  (0.5  +  1)  for  half  of  the  modes.  Our  researches 
have  shown  that  this  relation  is  saved  at  increase  N.  Note  that 
when  symmetry  in  subspace  (H)  is  not  exact,  SNR  for  some 
modes  may  be  maximized  by  choosing  h0(tOo).  It  is  illustrated  in 
Figure  3  for  m=2,  N=64  and  <»o=1.3.  In  this  case  another  some 
modes  have  much  more  SNR  than  for  oio=l  (Fig.l).  Note  that  it 
is  possible  to  change  the  type  of  symmetry  in  this  problem.  We 

can  choose  m=3,4,5 . But  if  increasing  of  m  the  complexity 

of  test  is  increased. 


4.  SUMMARY 

The  new  algorithm  for  matched  symmetrical  subspace  detector 
has  been  presented.  It  may  be  used  to  reduce  the  complexity  of 
known  algorithm  for  the  signal  subspace  detection.  High  quality 
performance  is  obtained  for  moving-target  indication  under 
unknown  Doppler  frequency.  The  used  the  new  approach  for 
creation  of  the  sub-optimal  detector  with  minimal  computing 
expenditures. 
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ABSTRACT 

Multiple  source  direction  finding  algorithms  (e.g.,  MUSIC ) 
are  applied  on  simultaneous  measurements  collected  by  M 
sensors.  However,  practical  considerations  may  dictate  us¬ 
ing  less  receivers  than  sensors,  such  that  the  measurements 
cannot  be  collected  simultaneously.  In  such  cases,  data  is 
collected  sequentially  from  the  different  array  elements  in 
a  process  which  is  referred  to  as  ’’time  varying  preprocess¬ 
ing”,  or  ’’switching”. 

In  this  paper  we  study  multiple  source  direction  finding 
(DF)  with  an  array  of  M  >2  elements,  where  only  two 
receivers  are  available. 

1.  INTRODUCTION  AND  PROBLEM 
FORMULATION 

Direction  finding  with  fewer  receivers  than  sensors  via  time 
varying  processing  is  a  very  important  issue  (e.g.,  [3]).  In 
many  practical  scenarios  the  number  of  receivers  is  con¬ 
siderably  less  then  the  number  of  sensors.  Moreover,  the 
tendency  is  to  use  the  minimum  number  of  receivers  possi¬ 
ble  which  maintain  spatial  capacity,  i.e.,  only  two  receivers. 
Reducing  the  number  of  receivers  results  in  a  cheaper  and 
simpler  design,  in  the  cost  of  a  reduced  performance.  In  this 
paper  we  investigate  the  multiple  source  localization  perfor¬ 
mance  from  the  identification  point  of  view.  We  first  find 
how  many  sources  can  be  localized  with  only  two  receivers 
and  and  then  we  suggest  a  computationally  efficient  algo¬ 
rithm  to  perform  this  task. 

Assume  q  far-field  narrow  band  sources  impinging  on 
an  array  with  p  >  q  sensors  from  directions  [Q\, . . .  ,9q}. 
Using  complex  signal  representation,  the  vector  of  received 
signals  can  be  written  as: 

x(f)  =  A(0)s(f)  +  n(f)  (1) 

where  s (t)  is  the  complex  envelope  of  the  slowly  varying 
signals,  n(f)  is  the  additive  noise,  0  is  the  vector  of  direc¬ 
tions  of  arrival,  and  A(0)  =  [a(0]), . . . ,  a(0,)]  where  a (6) 


is  the  array  steering  vector  at  direction  9.  We  denote  by 
[x(f)]j  the  *-th  element  of  vector  x(f). 

Under  the  standard  assumptions  about  the  noise  being 
Gaussian  and  white  and  of  the  signals  being  Gaussian,  the 
correlation  matrix  of  x(f),  denoted  by  Rx (0),  is  given  by: 

Rx{0)  =  A(0)RsAh(0)  +  a2l  (2) 

where  (• )H  denote  the  complex  conjugate  transpose  opera¬ 
tion,  zr2  is  the  noise  level  and  Rs  is  the  signal  covariance 
matrix. 

The  problem  of  estimating  0  from  a  set  of  N  snapshots 
of  the  array,  x(£j), . . . ,  x(£jv),  is  usually  refereed  to  as  the 
localization  problem.  The  case  of  spatial  samples  which  are 
time  dependent  linear  transformation  of  the  array  output  is 
discussed  in  [3],  The  resulting  model  for  the  measurements 
is  y{ti)  =  G(f,)x(fi),  where  G (ti)  is  the  time  dependent 
linear  transformation.  Note  that  G(£;)  is  a  matrix  in  which 
the  number  of  rows  is  the  number  of  receivers  used  at  time 

We  are  interested  in  the  special  case  where  G(<j)  is  a 
2  x  p  matrix  such  that  each  row  is  a  vector  with  all  elements 
but  one  equal  zero,  where  the  non  zero  element  equals  1. 
Without  loss  of  generality,  we  assume  that  we  take  N  snap¬ 
shots  of  each  sub  array  of  two  elements.  The  total  number  of 
snapshots  is  L  =  (£ )N.  At  time  instant  £*,  i  =  1, ...,  L,  the 
output  of  the  reduced  array  is:  y (ti)  =  [[x(fj)]fc  [x(f,)];]T 
for  some  k  ^  l  €  {1, . . .  ,p} 

2.  SOME  RELATED  RESULTS 

In  [3]  the  M L  estimator  for  a  general  transformation  matrix 
G (ti)  is  presented.  This  procedure  involves  maximization 
over  all  unknown  parameters:  G,a2,Rs .  This  maximiza¬ 
tion  problem  becomes  extremely  difficult  even  for  as  little 
as  two  sources.  The  authors  presented  an  ad-hoc  approach, 
the  GLS,  which  reduces  the  complexity  of  the  estimator  to 
a  search  over  only  q  parameters-. 

Alternatively,  by  noting  that  our  problem  can  be  mod¬ 
eled  as  a  problem  of  direction  finding  with  time-varying  ar¬ 
ray,  one  can  apply  the  results  of  [4]  which  include,  among 
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others,  expressions  for  the  Cramer  Rao  lower  bound  ( CRLB ) 
on  the  estimation  error  of  the  unknown  parameters.  Also, 
some  conjectures  about  the  complexity  of  the  ML  estima¬ 
tor  were  presented  which  suggested  that  in  the  general  case 
the  ML  estimator  is  not  separable. 

In  [2]  it  is  shown  that,  unlike  the  case  where  the  ar¬ 
ray  is  sampled  simultaneously,  in  cases  where  the  number 
of  sensors  in  the  sub-array  is  smaller  than  the  number  or 
sources,  the  CRLB  for  0\ , . . . ,  0q  does  not  approach  zero 
as  the  SNR  approaches  infinity,  so  the  time  varying  spatial 
sampling  process  causes  a  residual  estimation  error. 

Eigenvector  based  methods  for  the  case  of  time- varying 
arrays  had  been  proposed  in  [1],  In  this  paper  two  possi¬ 
ble  eigenvector  based  method  have  been  proposed.  One  is 
based  on  an  interpolating  matrix  and  the  other  is  based  on 
a  focus  matrix.  However,  both  methods  can  not  be  applied 
to  our  problem  due  to  the  large  differences  in  the  steering 
vectors  between  successive  time  instances. 


3.  THE  IDENTEFABILITY  PROBLEM 


It  is  well  known  that  when  the  array  is  simultaneously  sam¬ 
pled  so  (1)  holds,  and  under  some  very  weak  conditions  on 
the  array,  one  can  localize  up  to  p  —  1  sources.  Is  it  also  true 
when  only  two  receivers  bare  used?  The  following  theorem 
refers  to  this  question: 


Theorem  1  Using  an  array  ofp  sensors  and  only  two  re¬ 
ceivers,  up  to  q  =  p-1  narrowband  sources  can  be  uniquely 
localized. 


Proof  1  Let  y(L)  be  a  column  vector  with  2(^)  elements, 
given  by: 


y(U)  =  [y{ti)T ,y{ti+N)  ,y(ti+2N)T ,■  ■  ■  ,y(U+L-N)  ] 

(3) 

Without  loss  of  generality,  assume  that  first  we  take  N  sam¬ 
ples  of  the  first  and  second  sensors  simultaneously.  Next 
we  take  another  N  samples  from  the  first  and  third  sensors 
simultaneously,  and  so  on.  y(£i)  is  a  column  vector,  with 
the  first  two  elements  equal  to  the  first  sample  of  the  first 
two  sensors  sampled.  The  third  of  forth  elements  ofy(ti) 
are  the  two  elements  of  the  first  sample  from  the  second  and 
third  sensors,  and  so  on.  It  is  clear  that  {y(t;))i=i  con¬ 
tain  all  the  available  samples  and  thus  it  contains  all  the 
statistical  information  on  the  unknown  parameters. 

It  can  be  easily  verified  that  {y^i)}^  are  i.i.d.  com¬ 
plex  Gaussian  vectors  with  block  diagonal  correlation  ma¬ 
trix,  Ry(9),  given  by 


0 

[Rx(9)]ki 

[RM)ik 

[Rx{Q)\kk 

,  [R*  (£)]« 


I*  -  j\  >  1 
i  >  j 
i<j 

*  =  iiJ  ^  § 

o.w 


(4) 


where  k  and  l  are  the  first  and  second  sensors  sampled  at 
the  [|]  switching.  It  is  clear  from  the  structure  of  Ry  that 
a  simple  one  to  one  mapping,  denoted  by  'ip(R.x),  between 
Rx  and  Ry,  exists 

Let  9  =  [9i,...,6k]  and  (f  —  [9\ , . . . ,  9w]  be  two  sets 
of  bearings,  such  that  k,k'  <  q  —  1  and  O'  f  9.  For  the 
case  of  simultaneous  sampling  up  to  q  —  1  sources  could  be 
uniquely  localized,  i.e.,  Rx (9)  f  Rx(9')  for  every  9  f-  O'. 
Now,  using  the  fact  that  ip  is  a  one  to  one  mapping  between 
Rx  and  Ry,  it  is  clear  that  Ry(0)  f  Ry  ((f)  for  every  9  f 

Of. 

In  addition,  since  y(L)  is  a  complex  Gaussian  vector, 
the  p.d.f.  of  y(ti)  given  9  is  different  from  the  p.d.f.  of 
y(U)  given  (f,  which  is  a  sufficient  condition  for  identefia- 
bility. 

This  theorem  provides  a  very  important  result:  at  each 
time  instant  we  are  sampling  a  sub  array  of  size  two  which 
in  turn  enable  us  to  localize  only  one  source.  However,  co¬ 
herently  combining  all  the  results  from  the  sub  arrays,  en¬ 
ables  one  to  localize  p-1  sources,  the  same  number  as  if 
we  were  sampling  the  all  array  with  p  receivers. 

4.  EIGENVECTORS  BASED  METHODS 

The  ML  estimator  for  0  requires  a  q  dimensional  search, 
at  least.  Eigenvector  based  methods,  like  the  MUSIC,  of¬ 
fers  a  way  to  reduce  the  complexity  to  a  one  dimensional 
search.  This  reduction  in  complexity  is  crucial  since,  still 
today,  with  the  most  advanced  DSP,  searching  in  more  than 
two  dimensional  space  can  not  be  performed  in  real  time. 

We  next  describe  a  new  eigenvector  based  procedure 
which  can  be  used  in  our  problem.  We  start  with  the  fol¬ 
lowing  equivalent  description  of  the  data: 

Let  z (L)  be  a  column  vector  with  p  elements.  Let  all 
the  elements  be  equal  zero  except,  say  the  k  and  l  elements, 
which  are  equal  to  [x(£j)]fc  and  [x(L)]/,  respectively.  That 
is,  k,  l  are  the  two  array  elements  which  are  sampled  at  time 
ti.  Now,  denote  by  Rz  =  z(fi)zH(^)  the  em¬ 

pirical  correlation  matrix,  it  can  be  shown  that  its  expected 
value  is  given  by: 

Rz  =  A(0)RsAh  (9)  +  <t2I  +  A  (5) 

where  A  is  a  diagonal  matrix  whose  diagonal  entries  are 
(p  -  1)  •  diag(A(6)RsAH (0)  +  o2l).  The  matrix  A  is  the 
only  difference  from  the  mean  of  the  sample  covariance  ma¬ 
trix  in  the  case  where  the  array  is  sampled  simultaneously, 
where  eigenvalue  based  methods  are  easily  applied,  and  the 
mean  of  the  sample  covariance  matrix  where  only  two  re¬ 
ceivers  are  used  simultaneously. 

However,  if  all  the  elements  of  the  diagonal  matrix  A 
are  equal,  then  eigenvector  based  methods  for  estimating  0 
can  still  be  used,  since  it  is  just  added  to  the  noise  covariance 
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matrix  so  it  effectively  changes  the  (unknown)  noise  level. 
There  are  two  sufficient  conditions  for  all  the  elements  of  A 
to  be  equal: 

1 .  All  sources  are  uncorrelated,  so  Rs  is  a  diagonal  ma¬ 
trix. 

2.  All  the  array  elements  are  omnidirectional,  such  that 
|a,(0)|  —  |aj(#)|  Vi  ^  j  and  for  any  6. 

However,  since  these  conditions  are  rarely  fully  fulfilled 
in  practice,  MUSIC  like  procedures  cannot  be  applied  on 
Rz  directly. 

A  careful  examination  of  Rz(0)  and  of  Rx(0)  shows 
that  their  off-diagonal  elements  are  the  same,  while  the  di¬ 
agonal  elements  of  Rz{0)  arep—  1  larger,  V0.  We  therefore 
suggest  a  non-linear  pre-processing  procedure:  to  divide  the 

diagonal  elements  of  Rz  by  p—  1.  Denote  by  Rz  the  result¬ 
ing  matrix,  it  can  be  easily  verified  that  E{Rz}  =  Rx (8) 

and  thus  Rz  can  be  used  with  all  the  eigenvector  based 
methods,  e.g  MUSIC.  We  refer  to  the  MUSIC  with  the 
suggested  preprocessing  as  MMUSIC.  Naturally,  the  per¬ 
formance  of  the  MUSIC  and  of  the  MMUSIC  applied  to 
the  same  array  will  be  different,  since  only  the  first  moment 

(the  expected  value)  of  Rx  and  of  Rz  is  the  same. 

This  method  can  be  extended  to  cases  where  the  number 
of  samples  taken  from  each  sensor  is  not  equal.  Let  n;  be 
number  of  samples  taken  at  the  i-th  switching.  Let  Rz  — 
z It  can  be  verified  that  the  mean  of  Rz  is 
given  by: 

E{flz}  =  (A(6>)i?sA(6»)"  +  (72I)0^  (6) 

where  (\P)y  is  the  total  number  of  snapshots  taken  from 
the  i,j  sensors  simultaneously,  is  the  total  number  of 
snapshots  taken  from  i-th  sensor,  and  0  denotes  element  by 
element  matrix  multiplication.  The  suggested  preprocessing 
in  this  case  is  to  divide  each  element  of  Rz  by  the  corre¬ 
sponding  element  in  \&.  The  resulting  matrix,  denoted  again 

by  Rz ,  can  be  used  with  any  eigenvalue  based  method. 

5.  SIMULATION  STUDY 

Consider  a  uniform  linear  array  with  4  omni-directional  el¬ 
ements.  Assume  two  equi-power,  partially  correlated  (p  = 
0.25)  sources  at  bearings  0°,  15°  and  N  =  100.  In  Figure 
1  a  typical  spectrum  of  the  MM U SIC  is  shown.  For  com¬ 
parison,  we  show  a  typical  spectrum  of  the  MUSIC  which 
has  been  applied  on  Rz  without  preprocessing.  It  shows 
that  without  preprocessing  the  two  sources  are  not  resolved, 
so,  as  predicted,  the  MUSIC  cannot  be  used  directly  for 
multiple  source  localization. 


The  MUSIC  and  MMUSIC  norrftalized  cost  functions 


Figure  1:  Typical  MUSIC  and  M  MU  SIC  cost  functions 

We  now  present  results  of  a  simulation  performance  study 
for  the  same  experiment.  Figures  2  and  3  depict  the  proba¬ 
bility  of  detecting  two  sources  and  the  M SE  of  the  bearing 
of  the  first  source,  respectively,  for  various  correlation  co¬ 
efficients,  as  a  function  of  the  SNR.  These  results  are  based 
on  averaging  of  1000  Monte  Carlo  Runs. 


Figure  2:  The  probability  of  detecting  two  sources  as  a  func¬ 
tion  of  the  SNR. 

Figures  4  and  5  depict  the  probability  of  detecting  two 
sources  and  the  MSE  of  the  bearing  of  the  first  source,  re¬ 
spectively,  as  a  function  of  the  number  of  snapshots,  where 
the  SNR  is  fixed  at  10  dB. 

Generally  speaking,  this  study  suggests  that  the  perfor¬ 
mance  of  the  MMUSIC  improves  as  the  SNR  increases, 
as  the  number  of  snapshots  increases  and  as  the  correlation 
between  the  sources  decreases.  However,  our  future  work 
will  focus  on  analytic  performance  analysis  of  the  algorithm 
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Figure  3:  The  MSE  of  the  bearing  of  the  first  source  as  a  Figure  5:  The  MSE  of  the  bearing  of  the  first  source  as  a 
function  of  the  SNR.  function  of  the  number  of  snapshots. 


so  its  inherent  limitations  can  be  exploit. 


Figure  4:  The  probability  of  detecting  two  sources  as  a  func¬ 
tion  of  the  number  of  snapshots. 
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ABSTRACT 

In  this  paper,  we  propose  an  orthogonalized  version 
of  OJA  algorithm  (OOJA)  that  can  be  used  for  the 
estimation  of  minor  and  principal  subspaces  of  a  vector 
sequence.  The  new  algorithm  offers,  as  compared  to 
OJA,  such  advantages  as  orthogonality  of  the  weight 
matrix,  which  is  ensured  at  each  iteration,  numerical 
stability  and  a  quite  similar  computational  complexity. 

1.  INTRODUCTION 

Principal  and  minor  component  analysis  (PC A  and  MCA), 
which  are  part  of  the  more  general  principal  and  minor 
subspace  (PSA  and  MSA)  analysis,  are  two  important 
problems  that  are  frequently  encountered  in  many  in¬ 
formation  processing  fields. 

Let  {r(fc)}  be  a  sequence  of  IV  x  1  random  vec¬ 
tors  with  covariance  matrix  C  =  i?[r(fc)rT(A;)].  Con¬ 
sider  the  problem  of  extracting  the  principal  or  the  mi¬ 
nor  subspace  spanned  by  the  sequence,  of  dimension 
P  <  N,  assumed  to  be  the  span  of  the  P  principal 
or  minor  eigenvectors  of  the  covariance  matrix,  respec¬ 
tively.  To  solve  this  problem,  several  subspace  extrac¬ 
tion  algorithms  have  so  far  been  proposed  [l]-[5).  The 
minor  subspace  extraction  algorithm  of  Oja  et  al.  [4] 
can  be  formulated  as 

W(z  +  1)  =  W (i)  -  0  [r(t)yT(t)  -  W(i)y(i)yT(i)] 

=  W(i)  -  0p(i)yT{i)  (1) 


-W(z)y(i)yr(i)]  .  (2) 

The  discrete-time  update  of  (2)  suffers  from  a  marginal 
instability  similar  to  the  PCA  ( P  =  1)  algorithm  in  [2], 
Recently,  a  novel  self-stabilizing  MSA  algorithm  given 
by 

W(*  +  l)  =  W(t)-/?[r(i)y(f)TWT(i)W(i)x 

WT(i)W(i)  —  W(i)y(i)yT(i)] ,  (3) 

has  been  proposed  by  Douglas  et  al.  in  [3]. 

2.  ORTHOGONAL  OJA 

Our  algorithm  consists  of  (1)  plus  an  orthogonalization 
step  of  the  weight  matrix  to  be  performed  at  each  it¬ 
eration.  Orthogonality  is  an  important  property  that 
is  desired  in  many  subspace  based  estimation  methods 
[6].  To  this  end,  we  set  (using  informal  notation): 

W(»  +  1)  :=  W(*  +  l)(WT(i  +  1) W (i  +  1))~V2  (4) 

where  (WT(t  4-  l)W(t  4-  l))-1/2  denotes  an  inverse 
square  root  of  (WT(i  4-  l)W(z  4-  1)).  To  compute  the 
latter,  we  use  the  updating  equation  of  W(t+ 1).  Keep¬ 
ing  in  mind  that  W  ( i )  is  now  an  orthogonal  matrix,  we 
have 

WT(i+l)W(i+l)  =  l4-/?2||p(i)||2y(z)yT(i)  =  I+xxr, 


where  W (i)  €  HNy  P  is  the  minor  subspace  estimate, 
y(0  =  WT(i)r(z),  p(i)  =  (r(i)  -  W(i)y(i)),  and  0  > 
0  is  a  learning  parameter.  Reversing  the  sign  of  the 
adaptive  gain,  i.e.,  replacing  —0  in  (1)  by  +0,  yields 
a  principal  subspace  extraction  algorithm.  Chen  et  al. 
have  proposed  a  novel  MSA  algorithm  [5]  which  can  be 
written  as  follows 


where  we  have  used  the  fact  that  WT(i)p(i)  =  0, 1  is 
the  identity  matrix,  and  x  =  j8||p(t)||y(i).  Using 


(I  +  xxr)-V2  _  !  +  ( 


v/T+l 


-1) 


XX 


||2  > 


we  obtain 


W(i  +  1)  =  W(t)  -  0  [r(i)y(t)TWT(z)W(i)  (WT(z  +  l)W(i  +  1))"1/2  =  I  +  r(i)y(i)yT(z),  (5) 


0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


90 


where  r(i)  ^ - —  ( —  ■  .  — 1).  Sub** 

where  r{t)  ||y(0||2  ^  ^  +  /jaj|p(OII3||y(OHa 

stituting  (5)  into  (4)  and  using  the  updating  equation 

of  W(i  +  1)  leads  to 

W(t  +  1)  =  (W(t)-/3p(t)yT(t))(I  +  r(t)y(t)yr(t)) 
=  W(i)  -  f3p(i)yT(i),  (6) 

where  p(i)  =  — r(i)W(i)y(*)//3+(H-r(*)||y(*)ll2)p(*)> 
Thus,  the  algorithm  can  be  written  as 

•  Initialization  of  the  algorithm: 

W(0)  =  any  arbitrary  orthogonal  matrix. 


•  Algorithm  at  iteration  i: 


y(*) 

= 

WT(i)r(i) 

z(i) 

= 

W(t)y(i) 

p(*) 

= 

r(i)  -  z(t) 

= 

1 

v'i+/32llp(*)ll2lly(*)ll2 

r(i) 

= 

<t>(i)  - 1 
l|y«ll2 

P(0 

= 

-r(i)z(i)//3  +  <f>{i)p(i) 

W(i  + 1) 

= 

W(i)  —  Pp{i)yT{i) 

In  order  to  gain  : 

more 

insight  into  OOJA  algorithm 

we  must  examine  the  following  points: 

1.  Minor  subspace: 

In  terms  of  orthogonality  errors,  00  JA  algorithm 
guarantees  the  orthogonality  of  the  weight  matrix 
at  each  iteration.  With  the  orthogonalization  the 
three  algorithms  (I),  (2),  and  (3)  become  identi¬ 
cal.  However,  simulation  results  show  that  the 
discrete-time  update  of  OOJA  algorithm  is  sen¬ 
sitive  to  the  propagation  of  rounding-off  errors. 
Fortunately,  we  can  overcome  this  problem  by  re¬ 
formulating  the  algorithm  equations  as  shown  in 
section  3. 

2.  Principal  subspace: 

With  respect  to  subspace  errors,  our  algorithm 
converges  at  the  same  rate  as  (1).  In  terms  of  or¬ 
thogonality  errors,  it  guarantees  the  orthogonal¬ 
ity  of  the  weight  matrix  at  each  iteration,  whereas 
(1)  converges  to  an  orthogonal  weight  matrix  only 
asymptotically.  Finally,  it  is  worth  noting  that 
(3)  quickly  diverges  for  PSA. 

3.  Computational  complexity: 

The  computational  complexity  of  algorithms  (3) 


and  (2)  are  7NP  +  0(N)  and  5 NP  +  0(N )  flops 
per  iteration,  respectively.  OOJA  and  (1)  cost, 
however,  only  3NP+0(N)  flops  per  iteration.  It 
is  interesting  to  note  that  the  orthogonalization 
step  does  not  increase  the  computational  cost  of 
OJA  algorithm.  On  the  other  hand,  the  updating 
equation  of  the  weight  matrix  of  OOJA  algorithm 
has  a  more  compact  form  than  (2)  and  (3),  i.e., 
it  uses  only  one  outer  product  instead  of  two  for 
(2)  and  (3).  This  turns  out  to  be  useful  when 
a  subspace  extraction  algorithm  is  cascaded  with 
other  adaptive  algorithms,  e.g.,  [8]. 

4.  Convergence: 

The  convergence  of  OOJA  algorithm  follows  di¬ 
rectly  from  that  of  OJA  algorithm  [7].  In  fact,  (6) 
can  be  rewritten  as  W(i+1)  =  W(i)— f)p(i)yT(i)+ 
„0(/32).  Therefore,  for  <  1,  it  can  be  shown 
that  the  two  algorithms  have  the  same  conver¬ 
gence  performance. 

On  the  other  hand  the  convergence  proof  of  (3)  is 
not  complete.  Effectively,  to  prove  that  span[W] 
converges  to  span[E2],  where  E2  is  the  minor  P- 
dimen-  sional  subspace  spanned  by  the  eigenvec¬ 
tors  corresponding  to  the  P  smallest  eigenvalues, 
Douglas  et  al.  [3]  have  used  the  following  assump¬ 
tion: 

If  all  the  eigenvalues  of  M (t)  have  negative  real 
parts,  then  for  the  following  system 

^*)=M(£)Q(t), 

we  have 

lim  Q (t)  =  0. 

t-+oo 

This  assumption  is  true  if  M(£)  is  time  invariant 
but  not  always  true  when  M(f)  is  time  variant  as 
shown  by  the  counter  examples  given  in  [10,  11]. 

3.  IMPLEMENTATION  USING 
HOUSEHOLDER  TRANSFORMATION 

Because  of  the  numerical  instability  of  OOJA  when 
used  for  minor  subspace  estimation,  we  propose  here 
another  implementation  of  the  algorithm  based  on  House¬ 
holder  transformation.  In  fact,  the  new  implementa¬ 
tion  can  be  derived  from  a  reformulation  of  (6)  in  terms 
of  Householder  transformation.  We  have  the  following 
result: 

Proposition  1  Let  u(i)  =  p(i)/||p(j)[|.  Then  equa¬ 
tion  (6)  can  be  rewritten  as 

W(t  +  1)  =  H(t)W(i)  (7) 
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where  H(i)  is  the  Householder  transformation  given  by 
H(i)  =  I  —  2u(i)uT(t) 

Based  on  this  result  (see  appendix  for  proof),  the 
new  implementation  consists  in  computing  successively 
y(t),  p(i),  r(t),  and  p(t).  Then,  we  compute 

u(»)  =  p(0/IIp(*)II 

v(i)  =  WT(i)u(t) 

W(*  +  l)  =  W(i)  -  2u(t)vT(i) 


new  implementation  is  numerically  stable. 

Example  3:  We  consider  here  the  same  context  as 
in  the  previous  examples.  By  reversing  the  sign  of  p , 
we  extract  now  the  principal  P-dimensional  subspace. 
In  (9),  we  replace  Ei  by  E2  and  vice  versa.  As  we  can 
see  from  figure  3,  our  algorithm  (without  Householder 
implementation)  is  numerically  stable  and  has  better 
performance  than  (1),  (2),  and  (3). 

5.  CONCLUSIONS 


Since  the  decomposition  of  the  weight  matrix  involves  In  this  paper;  we  proposed  an  orthogonal  OJA  (OOJA) 
the  use  of  numerically  well-behaved  Householder  or-  algorithm  that  can  perform  both  PCA  and  MCA  by 

thogonal  matrices  (see  [9]  pp.209-213),  OOJA  becomes  simply  switching  the  sign  of  the  same  learning  rule, 

numerically  very  stable.  The  new  implementation  presents  We  gave  tw0  fast  implementations  of  OOJA  where  the 
now  a  computational  complexity  of  4 NP  +  O(N)  flops  orthogonality  of  the  weight  matrix  is  ensured  at  each 
per  iteration.  iteration.  OOJA  is  numerically  stable  and  its  compu¬ 

tational  complexity  is  smaller  than  those  reported  in 
4.  SIMULATION  RESULTS  [3]  and  [5]. 


Example  1:  In  this  example,  we  choose  r (i)  to  be  a 
sequence  of  independent  jointly-Gaussian  random  vec¬ 
tors  with  covariance  matrix 


/  0.9  0.4  0.7  0.3  \ 

0.4  0.3  0.5  0.4 

0.7  0.5  1.0  0.6 

V  0.3  0.4  0.6  0.9 


(8) 


P  =  2,  P  —  0.01,  and  as  recommended  in  [5]  W(0)  = 
D,  where  Dtj  =  S(j  —  i).  As  in  [3],  we  calculate  the 
ensemble  averages  of  the  performance  factors 

P(i)  =  )  _ - ~ - r.  (9) 

ro  tr  (W-T (i)E2  *  E2rWr(i)J 

riH)  =  ^-f:i|W?(0Wr(0-I|ft,  (10) 

where  the  number  of  algorithm  runs  is  ro  =  100,  r  indi¬ 
cates  that  the  associated  variable  depends  on  the  par¬ 
ticular  run,  ||.||f  denotes  the  Frobenius  norm,  and  Ei 
(respectively  E2)  is  the  principal  ( N  -  P) -dimensional 
subspace  (respectively  minor  P-dimensional  subspace) . 
Figure  1  compares  the  performance  of  OOJA  (without 
Householder  implementation)  with  (1),  (2),  and  (3). 
As  we  can  see  our  algorithm  behaves  better  than  (1) 
and  (2),  but  still  suffers  from  numerical  instability. 


Example  2:  In  this  example  all  parameters  are 
kept  the  same  as  in  the  first  example.  Figure  2  shows 
the  performance  of  Householder-based  OOJA  algorithm 
as  compared  to  (1),  (2),  and  (3).  We  can  see  that  the 


6.  APPENDIX 

Proof  of  proposition  1:  Using  the  definition1  of  y  we  can 
write  pyr  =  prTW.  By  decomposing  the  observation 
vector  as: 

r  =  WWTr  +  (I  -  WWT)r 
=  Wy  +  p 

—P  \—  r__.  — r 

=  —  ItWj,+tt 

we  can  write 


PVTW  =  ^pLlWy+^j  W 

T  P  P  . 

fjr 

=  ^P  ~ Wy  +  (l  +  r||y||2)p  W 

=  — PPTW. 

T 

where  the  second  equality  comes  from  the  fact  that 
pTW  =  0.  Finally,  we  obtain  W(i+1)  =  (I+^jyp(i)p(i)T)W 
To  complete  the  proof  we  have  to  show  that 

a 2  _2  _2  T 

—  =  jj=jj2  or  equivalently  ||p||2  = 

Using  the  definition  of  p  and  the  equality  1  +  r||y ||2  = 

(1  +  /J2||p||2||y||2)-1/2,  we  can  write 

l|p|1  ^  i  +  /32liPll2llyll2 

lHere,  we  omit  the  time  index  t  to  simplify  the  notations. 


92 


1 


1 


(a) 


iW 

P2 

1 

m? 

—2t 

P2 


r(i- 


/?2||y||2Vi  I  +  ^IIpIIW 
((r||y||2)2  +  l-(l+r||y||2)2) 


) 


□ 


10*  10*  10*  10*  to4  10* 

number  of  Iterations 


Figure  1:  Average  behaviors  for  MSA. 
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Figure  2:  Average  behaviors  for  MSA  using 
Householder-based  implementation. 
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ABSTRACT 

A  crucial  step  in  many  signal  processing  applications  is  the 
determination  of  the  effective  rank  of  a  noise  corrupted 
multi-dimensional  signal,  i.e.,  the  dimension  of  the  signal 
subspace.  Standard  techniques  for  rank  estimation,  such  as 
the  minimum  description  length,  often  have  shortcomings 
in  practice,  an  example  being  when  noise  parameters  are 
unknown.  An  alternative  scheme  is  proposed  for  rank 
detection.  From  successive  pairs  of  the  ordered  eigenvalues 
of  the  array  covariance,  a  series  of  statistics  is  formed.  The 
statistics  are  chosen  such  that  their  distributions  for  noise 
eigenvalue  pairs  are  close.  The  actual  distributions  are 
unknown  and  are  estimated  with  the  Bootstrap.  The  rank  is 
then  found  by  a  sequential  comparison  of  the  estimated  dis¬ 
tributions  using  a  Kolmogorov-Smirnov  test. 

1.  INTRODUCTION 

Many  signal  processing  algorithms,  such  as  direction  find¬ 
ing  algorithms,  rely  on  the  low-rank  structure  of  a  multi- . 
dimensional  signal.  The  rank  typically  has  an  interpretation 
as  the  model  order,  revealing  the  number  of  signals  hidden 
in  noise,  or  the  dimension  of  a  low-order  signal  subspace. 
Therefore,  finding  the  effective  rank  of  a  noise  corrupted 
signal  is  a  crucial  initial  step  in  many  applications. 

Classical  techniques  to  estimate  the  rank  when  the 
noise  is  Gaussian  include  the  minimum  description  length 
(MDL)  and  Akaike’s  information  theoretic  criterion  (AIC) 
f  10],  and  their  subjective  counterpart  the  sphericity  test  [2], 
In  the  latter,  a  threshold  is  set  to  obtain  a  desired  level  of 
the  test,  whereas  in  the  objective  MDL  and  AIC,  the  actual 
threshold  is  dependent  on  the  data  size  by  asymptotic  argu¬ 
ments.  Nevertheless,  they  all  rely  on  the  structure  of  the 
noise  eigenvalues  of  the  covariance  matrix,  and  it  is 
required  that  the  actual  spatial  noise  color  is  known.  If  the 
noise  assumptions  are  violated,  for  example,  when  the 
noise  has  an  unknown  spatial  color,  detection  performance 
is  degraded.  For  noise  of  unknown  color,  an  alternative  to 
eigenvalue-based  tests  is  to  use  properties  of  canonical  cor¬ 
relations  [2],  as  in  [11][12],  However,  these  schemes  put 
some  restrictions  on  the  structure  of  the  data  model,  limit¬ 
ing  their  applicability. 

1.  This  work  was  in  part  supported  by  the  Australian  Telecommunica¬ 
tions  Cooperative  Research  Centre  (AT-CRC). 


To  mitigate  the  problem  of  slight  uncertainties  in  the  noise 
model,  both  w.r.t.  possible  non-Gaussianity  and  noise 
color,  a  new  technique  for  rank  detection  is  proposed.  The 
detection  procedure  is  based  on  a  property  of  the  marginal 
distributions  of  the  noise  sample  eigenvalues.  Instead  of 
relying  on  parametric  assumptions,  these  distributions  are 
estimated  from  the  data  using  the  Bootstrap  [5].  Based  on 
these  estimates,  the  distributions  of  a  series  of  secondary 
variables  are  estimated,  on  which  the  actual  rank  estima¬ 
tion  is  performed  using  a  robust  Kolmogorov-Smirnov  test 
[7].  The  necessary  number  of  Bootstrap  resamples  is  sur¬ 
prisingly  small,  keeping  the  computational  cost  at  a  reason¬ 
able  level. 

2.  MODELING 

Consider  m-variate  data  according  to  the  linear  model 

x(«)  =  4s(n)  +  v(n)  (1) 

where  A  is  a  mixture  matrix  (for  example  the  array  steer¬ 
ing  matrix  in  sensor  array  processing),  s{n)  is  a  vector  of 
signals,  and  v(n)  is  noise  from  some  possibly  unknown 
distribution.  Assuming  the  signal  and  noise  are  uncorre¬ 
lated  and  zero-mean,  the  array  covariance  is 

Rx  =  E[x(n)xH(n )]  =  ARSAH  +  RV.  (2) 

The  problem  considered  is  to  determine  the  rank  of  the  sig¬ 
nal  part/subspace,  i.e.,  d  =  rank(ARf) ,  based  on  N 
observations  of  the  data  (1). 

If  the  additive  noise  is  spatially  white,  Rv  =  a2 1 ,  The 
(population)  eigenvalues  of  (2)  are 

>  ...  >  Xd  >  Xd+ ,  =  ...  =  X„,  =  o  ,  (3) 

i.e.,  the  true  noise  eigenvalues  are  all  equal.  However,  when 
calculated  from  the  sample  covariance 

1  N 

RX  =  jy  X  x(n)xH(n) ,  (4) 

n  -  1 

estimated  from  a  finite  number  of  N  data  snapshots,  the 
ordered  sample  eigenvalues  are  distinct  with  probability 
one,  i.e., 

Xi>  ...  >Xd>Xd+\ >  ...  >Xm>0.  (5) 

The  distribution  Fnx(X)  of  (5)  for  a  data  sample  of  N 
snapshots,  either  in  the  form  of  a  probability  density  func¬ 
tion  (PDF),  or  a  cumulative  distribution  function  (CDF), 
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tends  to  take  a  very  complex  form.  The  sample  eigenvalues 
are  biased  (as  in  (5))  and  mutually  correlated.  The  exact 
distribution  is  only  known  for  the  Gaussian  case  with  cer¬ 
tain  population  eigenvalues,  and  is  given  in  the  form  of  a 
series  expansion  [8].  For  the  general  case,  both  w.r.t.  the 
actual  source  distribution  and  the  population  eigenvalues, 
the  distribution  (joint  or  marginals)  may  only  be  available 
asymptotically,  for  large  N  [1][8].  For  small/moderate  N , 
corresponding  to  many  practical  applications,  the  error  in 
the  asymptotics  may  be  substantial.  Thus,  there  is  no  gen¬ 
eral  ‘ease  of  use’  form  of  Fnx(X)  available. 

Instead  of  relying  on  asymptotic  results,  which  are 
unreliable  on  short  data  records,  the  detection  scheme  to  be 
presented  in  the  next  section  will  be  based  on  an  approxi¬ 
mate  relation  between  the  marginal  distributions  of  the 
noise  sample  eigenvalues.  Specifically,  numerical  experi¬ 
ments  indicate  that  for  the  white  noise  case  (3),  the  mar¬ 
ginal  PDFs  of  the  noise  sample  eigenvalues  are 
approximately  related  as 

fNi(ii)=fN(K%)  i>d+  1  (6) 

for  some  k  ,  i.e.,  the  marginals  /W,(X,),  i>d+  1  are  simply 
scaled  versions  of  the  same  basic  PDF  fN(-)  ■  While  there  is 
no  claim  of  the  generality  of  this  approximation,  it  has 
shown  to  very  precise  when  the  ratio  N/m  is  say  five  or 
higher.  Also,  what  is  important  for  detection  based  on  this 
property,  is  that  the  approximation  is  robust  to  slightly 
colored  noise,  and  practically  invariant  to  non-Gaussianity. 
Then,  even  if  the  data  does  not  correspond  perfectly  to  the 
assumed  data  model,  (6)  allows  for  robust  rank  detection. 
An  example  to  illustrate  (6)  will  be  given  in  Section  4. 

3.  DETECTION 

3.1.  Detection  principle 

To  indicate  how  the  relation  (6)  can  be  used  for  rank  esti¬ 
mation,  assume  a  number  of  m  independent  variables  T) ; 
having  distributions  identical  to  the  marginal  distributions 
of  the  sample  eigenvalues  1, .  From  the  T|, ,  m  -  1  second¬ 
ary  variables  v,  are  formed  as  the  ratios 

v/  =  V'1l/+i»,‘e  [1,/n-l].  (7) 

Then,  up  to  the  order  of  the  approximation  given  in  (6),  v,- 
for  i  €  [d+  \,m-  1]  will  have  identical  distributions,  as 
these  v,  are  invariant  to  the  (possibly  unknown)  scaling  K . 
However,  vd  =  r\d/r\d+i ,  involving  the  marginal  of  the 
smallest  signal  eigenvalue,  will  tend  to  larger  values  than 
vrf+ , .  This  forms  the  basis  for  rank  detection:  if  the  mar¬ 
ginals  can  be  captured  from  the  data 

x(n),ne  [1,  N] ,  the  order  d  can  be  estimated  by  testing 
for  equality  among  the  distributions  of  v„  i  e  [1,  m  -  1  ] . 

A  practical  algorithm  to  exploit  this  property  for  rank 
estimation  is  as  follows: 


1 .  Use  the  Bootstrap  to  first  estimate  the  marginals  of  the 
sample  eigenvalues  fNi(h),  i  =  [1,  m] ,  and  then  the 
distributions  of  vf,  i  e  [1,  m  -  I] . 

2.  Apply  the  Kolmogorov-Smimov  test  [7]  to  test  for 
pair-wise  equality  of  the  distributions  Fv  ,(v,)  of'v,  , 
starting  from  the  bottom  (Fv  m_2(v„,_2)  versus 
Fv,  m  _  i(vm_  i)  )>  and  stepping  up  until  equality  is 
rejected. 

Before  going  into  the  full  details  of  the  scheme,  it  is  neces¬ 
sary  to  establish  how  the  Bootstrap  behaves  when  resam¬ 
pling  data  to  calculate  eigenvalues. 

3.2.  The  Bootstrap  and  eigenvalues 

The  Bootstrap  is  a  general  tool  for  estimation  of  the  distri¬ 
bution  of  a  statistic  from  a  sample  of  data.  In  this  case  the 
Bootstrap  is  employed  to  estimate  FnX(\) .  The  principle 
of  the  Bootstrap  is  as  follows.  The  original  data 
x(n),  n  =  [  1,  N] ,  i.e., 

XN  =  [*(  I ),...,  x(N)],  (8) 

is  an  estimate  of  the  distribution  of  x(n).  Assigning  each 
snapshot  a  probability  l/N ,  resamples  are  taken  randomly 
(with  replacement)  from  XN ,  giving  Bootstrap  data 

X*N  =  [x\l ),...,  x*(N)].  (9) 

From  the  Bootstrap  resample  X*N ,  the  sample  eigenvalues 
are  calculated  (through  (4)),  giving 

r  =  [it . i*m]  (10) 

with  X*  >  ...  >  £,,*  >  ...  >  V„, .  The  procedure  is  repeated  a 
number  of  B  times.  Then,  the  Bootstrap  distribution 
derived  from  the  B  replicates  of  (10)  is  a  nonparametric 
estimate  of  FNX(%) . 

As  the  sample  eigenvalues  are  highly  non-linear  func¬ 
tions  of  the  data  sample,  results  on  the  Bootstrap  w.r.t.  lin¬ 
ear  statistics  do  not  apply.  Though,  some  results  on  the 
properties  of  eigenvalues  calculated  from  resampled  data 
can  be  found  in  [3]  [4]: 

•  For  distinct  population  eigenvalues,  Fvx(^)  converges 
asymptotically  to  FnX(\)  . 

•  For  equal  population  eigenvalues  (such  as  in  the  white 
noise  case),  FN\(i)  does  not  converge  to  FnX(K). 
However,  if  resamples  are  taken  of  size  M  <N  from 
X^ ,  such  that  M  — »  °°  as  N  — >  °° ,  while  M/N  — »  0 , 
then  Fm\(X)  converges  weakly  to  F m  {\) ,  i.e.,  the 
distribution  of  the  eigenvalues  of  a  sample 
x(n),n  =  [1  ,M]. 

From  numerical  experiments  it  is  easily  seen  that  the  major 
problem  with  the  Bootstrap  is  to  characterize  the  depend¬ 
ence  between  sample  eigenvalues:  while  the  Bootstrap  does 
make  a  good  job  capturing  the  marginals,  the  dependence 
between  the  sample  eigenvalues  is  not  maintained  in 
Fn\(X)  for  reasonable  N .  This  motivates  the  use  of  the 


95 


marginals  only.  Also,  a  full  characterization  of  the  joint  m  - 
dimensional  distribution  would  require  a  very  large  data 
record  (N) .  By  only  considering  the  marginals,  a  much 
smaller  data  size  is  required.  It  is  also  worth  considering 
resamples  of  size  M  <N .  This  relaxes  the  strong  depend¬ 
ence  on  the  actual  data  XN  somewhat,  which  seems  to 
remove  some  erratic  behavior  seen  on  small  sample  sizes. 

3.3.  Detection  scheme 

The  full  estimation/detection  procedure  is  as  follows: 

1.  Estimate  the  marginal  distributions  /m,(A,), 
i  =  [1,  m] ,  by  taking  B  resamples  of  size  M  from 
the  data  XN  ^  For  each  resample,  calculate  the  sample 
eigenvalues  A  (10). 

2.  Estimate  the  distributions  of  v(,  i  e  [1,  m-  1]  (7).  To 
do  this,  note  that  in  place  of  the  fictitious  independent 
variables  t|(,  i  e  [1,  m\ ,  sample  eigenvalues  A*  from 
different  resamples  A  can  be  used  (the  sample  eigen¬ 
values  from  one  resample  are  correlated).  Thus,  form 

V*  =  )//(£*+ 1)*,  /  e  [1,  m-  1]  (II) 

with  /  and  k  being  different  resamples.  Although  an 
arbitrary  number  of  resamples  {Bf)  of  (11)  could  be 
taken,  it  is  sensible  to  use  all  B  A*  from  step  1  in  a 
systematic  way.  Estimate  the  CDFs  of  v, , 
ie  [  1,  m  -  1  ] ,  by  the  staircase  approximation 

Fv,i(x)  =  number  of(v*  <  jc)/J5  .  (12) 

3.  Determine  the  test  statistics  for  the  one-sided  Kol- 
mogorov-Smirnov  (KS)  test  from  the  distributions  (12) 

T,  =  sup(Fv, /  +  i(jt)  —Fv,i(x)) 

x  U-U 

for  i  =  [  1,  m  —  2] .  Under  the  hypothesis  that 
Fv,i+\(x)  and  Fvj(x)  are  equal,  the  test  statistic  7)  is 
asymptotically  distributed  as  [7] 

P(jBTt  <  x)  ->  1  -  exp(-2x2)  (14) 

for  x  >  0  . 

4.  Final  step.  Determine  the  rank  d  from  a  sequential  test 
on  the  KS  statistics: 

I  Set  i  =  m  -  2 . 

II  Define  the  null  hypothesis  H:  d  =  i ,  and  the 
alternative  hypothesis  K:  d  <i . 

III  Set  a  threshold  y  based  on  the  tail  area  of  the  distri¬ 
bution  (14)  of  (13)  under  K  [7]. 

IV  If  Tj  >  y  accept  H  (i.e.  reject  equality  of  distribu¬ 
tions)  and  stop,  else  set  i  =  i-  1  and  return  to  II. 

Note  that  in  order  to  enable  a  correct  decision,  the  test  pro¬ 
cedure  requires  there  are  at  least  two  noise  eigenvalues. 

There  are  a  number  of  parameters  to  be  tuned/chosen 
in  the  scheme.  First,  consider  the  resample  size  M .  A 
smaller  M  tends  to  improve  the  estimate  of  fMl(X,) .  How¬ 


ever,  a  small  M  leads  to  a  loss  in  the  signal  to  noise  ratio 
(SNR)  detection  threshold  (i.e.,  the  minimum  SNR 
required  for  reliable  rank  detection),  as  the  relative  distance 
between  fMd(Xd)  and  fM(d+l)(X(d  +  i))  decreases  with  a 
decreasing  M.  For  a  data  size  N  of  order  ~0(1O2) ,  a  rea¬ 
sonable  trade-off  is  M  =  3N/4 . 

The  number  of  Bootstrap  resamples  B  has  an  impact 
on  the  estimated  distributions  and  is  therefore  a  crucial 
parameter.  Some  guidelines  on  the  impact  of  the  number  of 
Bootstraps  B  can  be  found  in  [5]  [6].  Unfortunately,  no 
results  are  given  in  absolute  terms.  However,  note  that  the 
proposed  detection  scheme  does  not  require  any  critical 
values  to  be  estimated  with  high  precision.  What  is  impor¬ 
tant  is  that  the  locations  of  the  distributions  of  the  v,  are 
estimated  with  sufficient  accuracy  for  the  subsequent  KS 
test  to  work  properly.  Thus,  B  should  be  large  enough  such 
that  the  means  of  v*  are  reasonably  stable  on  a  normalized 
scale.  A  coarse  first  order  approximation  of  £[jiv-]  gives 

k;  -  k:(  >  <15> 


i.e.,  the  stability  of  the  location  depends  to  a  large  extent  on 
the  location  of  A,  .  To  arrive  at  an  expression  for  the  neces¬ 
sary  B ,  note  that  the  sample  eigenvalues  are  reasonably 
close  to  Gaussian.  The  separation  (bias)  of  two  sample 
eigenvalues  corresponding  to  equal  population  eigenvalues 
is  roughly  two  times  the  standard  error,  see  Figure  1  (note 
that  this  relation  holds  regardless  M).  Now,  the  standard 
error  of  the  sample  mean  of  B  iid  Gaussian  variables  is 
O/jB .  Thus,  the  location  error  of  }'Mi(Xj) ,  normalized  to 
the  separation  of  neighbouring  distributions,  is  of  order 


o/jB  _  _1_ 

20  2  Jb' 


(16) 


As  an  example,  with  B  =  25  the  location  error  is  of  order 
0.1  which  is  small  enough  for  reliable  detection.  Note  that 
there  is  no  point  using  too  large  a  B ,  as  the  error  originat¬ 
ing  from  the  approximation  (6)  then  will  dominate  the  ‘ran¬ 
domness’  in  Tj . 

The  final  parameter  to  be  chosen  is  the  threshold  y  for 
the  KS  test  in  Step  4.  This  threshold  can  be  determined  in 
two  ways.  First,  y  can  be  set  to  maintain  a  desired  level  of 
the  test  at  each  sequential  stage  (as  in  the  sphericity  test), 
based  on  the  distribution  (14)  of  the  test  statistic  (13)  under 
K  (the  hypothesis  that  the  distributions  are  equal).  Alterna¬ 
tively,  y  can  be  set  for  ‘MDL-like  consistency’.  To  see  this, 
note  that  7,  — >  1  rapidly  under  Ft  for  increasing  SNR,  or 
N .  At  the  same  time,  under  K ,  the  tail  probability  of  7)  is 
small  even  for  modest  y.  Thus,  y  can  be  set  to  provide  a 
probability  of  detection  very  close  to  one,  without  much 
penalty  in  the  SNR  threshold.  As  an  example,  with 
B  =  25,  the  95%  level  under  K  is  y=0.35.  With 
y  =  0.7  ,  the  level  is  99.9995%. 
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a) 


Figure  1.  CDFs  of  a)  sample  eigenvalues,  b)  scaled  sample 
eigenvalues,  c)  scaled  Bootstrap  eigenvalues,  and  d)  the  test 
variables  (k,*)  /  {k* +  ,  )k . 

4.  NUMERICAL  EXAMPLES 

The  detection  scheme  relies  on  the  validity  of  the  assump¬ 
tion  (6).  To  illustrate  the  principle  of  the  test,  data  was  gen¬ 
erated  according  to  the  model  (1):  a  6-element  uniform 
linear  array  with  half  wavelength  element  spacing  receives 
d  =  2  uncorrelated  Gaussian  signals  from  directions 
[  10°,  25°] ,  relative  to  the  array  broadside.  The  signals 
were  observed  in  white  Gaussian  noise  with  an  element 
SNR  of  -3dB.  Figure  la  shows  the  marginal  CDFs  of  the  6 
sample  eigenvalues,  when  calculated  based  on  N  =  100 
independent  array  snapshots.  Figure  lb  shows  the  CDFs 
when  the  sample  eigenvalues  have  been  pre-scaled  with 
k'~4  (relative  to  eigenvalue  number  four)  as  in  (6).  In  this 
case,  k  =  1.21 ,  and  the  scaled  noise  CDFs  are  all  very 
close,  with  a  largest  pair-wise  separation  |/7’Kf  -  FK(i  +  x  j|  of 
0.11  for  i>  d . 

Similarly,  Figure  lc  shows  the  CDFs  of  scaled 
(k=1.36)  Bootstrap  eigenvalues,  estimated  from  B  =  50 
resamples  of  size  M  =  75  ,  taken  from  one  data  realization 
of  N  =  100  snapshots.  Clearly,  the  Bootstrap  eigenvalues 
are  slightly  more  variable,  which  is  due  both  to  M  <N , 
and  the  effective  loss  in  sample  size  from  resampling. 
Again,  the  noise  CDFs  are  close,  but  with  some  random 
fluctuations  due  to  the  limited  number  B  .  However,  note 
that  even  with  an  infinite  number  of  Bootstraps,  there  will 
still  be  a  remaining  error  due  to  the  approximation  (6)  (as 
in  Figure  lb),  as  well  as  the  limitation  of  the  Bootstrap 
itself  [3]  [4],  Finally,  the  CDFs  of  the  variables  (1 1),  calcu¬ 
lated  from  the  B  =  50  sets  of  Bootstrap  eigenvalues,  are 
shown  in  Figure  Id.  These  are  the  CDFs  on  which  the  KS 
test  is  based.  The  ‘noise  only’  CDFs  are  close,  while  the 
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Figure  2.  The  probability  of  correctly  estimating  the  rank 
( d  =  2)  versus  the  SNR,  for  various  spatial  noise  color:  a) 
Proposed  scheme,  b)  MDL. 

CDF  of  ki/kl  is  the  rightmost;  with  increasing  SNR  or 
data  size  this  CDF  moves  further  to  the  right.  Clearly,  the 
KS  test  can  easily  decide  on  the  correct  rank  from  the  sepa¬ 
ration  of  the  CDFs.  Note  that  the  dashed  CDF  is  due  to  the 
two  signal  eigenvalues. 

In  the  ideal  case,  with  white  Gaussian  noise,  the  per¬ 
formance  of  the  proposed  scheme  is  virtually  identical  to 
MDL  and  the  sphericity  test  (depending  on  how  the  thresh¬ 
old  is  set;  for  ‘consistency’,  or  for  a  fixed  level)  in  terms  of 
SNR  and  data  size  thresholds,  and  the  ability  to  resolve 
closely  spaced  targets.  Instead,  the  power  of  the  new 
method  lies  in  its  robustness  to  unknown  noise  color.  To 
illustrate,  data  was  generated  according  to  Figure  1,  but 
varying  the  spatial  noise  color.  Specifically,  the  kith  ele¬ 
ment  of  the  noise  covariance  matrix  in  (2)  was 
( Rv)kl  =  exp(— a|fc-/|) ,  with  a  being  the  parameter  to 
be  varied.  For  the  detection  procedure,  B  =  25  resamples 
of  size  M  =  75  were  taken  from  each  original  data  set  of 
size  N  =  100 .  The  actual  B  is  at  a  boundary:  a  smaller  B 
leads  to  a  penalty  in  SNR  threshold,  whereas  a  larger  gives 
no  further  improvement.  The  threshold  y  was  set  to  0.7  for 
‘consistent’  detection.  The  performance  of  the  proposed 
scheme  as  well  as  MDL  as  a  function  of  the  SNR  is  shown 
in  Figure  2a-b,  for  ae  [0, 0.1, 0.2, 0.3] .  It  is  seen  that  the 
proposed  scheme  maintains  good  detection  performance 
for  increasing  a .  Though,  there  is  a  penalty  in  the  low 
SNR  threshold.  This  is  caused  by  the  distributions  of  the 
noise  eigenvalues  being  further  separated  for  an  increasing 
a ,  leading  to  a  reduction  in  the  SNR  margin.  Increasing  a 
beyond  0.3  causes  a  more  substantial  degradation  as  the 
approximation  (6)  is  no  longer  good.  The  performance  of 
MDL  suffers  at  comparatively  small  a . 
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Figure  3.  The  probability  of  correctly  estimating  the  rank 
( d  =  2)  versus  the  SNR,  for  various  temporal  noise  color:  a) 
Proposed  scheme,  b)  MDL. 
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5.  CONCLUSIONS 

A  new  technique  for  rank  estimation  has  been  presented. 
While  giving  similar  performance  as  classical  well-known 
techniques  under  ideal  conditions,  the  new  method,  based 
on  the  Bootstrap,  is  robust  to  errors  in  the  noise  model.  The 
price  for  robustness  is  an  increase  in  the  computational 
complexity.  However,  as  the  number  of  Bootstrap  replica¬ 
tions  is  fairly  small,  this  increase  is  modest. 
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ABSTRACT 

We  introduce  a  new  approach  for  the  detection-estim¬ 
ation  problem  for  sparse  linear  antenna  arrays  com¬ 
prising  M  identical  sensors  whose  positions  may  be 
noninteger  values  (expressed  in  half- wavelength  units). 
This  approach  considers  the  (noninteger)  Ma -element 
co-array  as  the  most  appropriate  virtual  array  to  be 
used  in  connection  with  the  augmented  covariance  ma¬ 
trix.  Since  the  covariance  matrix  derived  from  such 
virtual  arrays  are  usually  very  underspecified,  we  dis¬ 
cuss  a  maximum-likelihood  (ML)  completion  philoso¬ 
phy  to  fill  in  the  missing  elements  of  the  partially  spec¬ 
ified  Hermitian  covariance  matrix.  Next,  a  transforma¬ 
tion  of  the  resulting  unstructured  ML  matrix  results 
in  a  sequence  of  properly  structured  positive-definite 
Hermitian  matrices,  each  with  their  ( Ma  —  p)  small¬ 
est  eigenvalues  being  equal,  appropriate  for  the  candi¬ 
date  number  of  sources  p.  For  each  candidate  model 
(p  =  1,  . . . ,  Ma  - 1),  we  then  find  the  set  of  directions- 
of-arrival  (DOA’s)  and  powers  that  yield  the  minimum 
fitting  error  for  the  specified  covariance  lags  in  the 
neighbourhood  of  the  MUSIC-initialised  DOA’s.  Fi¬ 
nally,  these  models  describe  a  hypothesis  with  respect 
to  the  actual  number  of  sources,  and  allow  us  to  se¬ 
lect  the  “best”  hypothesis  using  traditional  informa¬ 
tion  criteria  (AIC,  MDL,  MAP,  etc.)  that  are  based  on 
likelihood  ratio. 

1.  INTRODUCTION 

In  our  previous  papers  [5,  3,  2,  4],  we  introduced  a 
new  technique  for  detection-estimation  of  more  uncor¬ 
related  Gaussian  sources  m  than  sensors  M  (m  >  M) 
for  the  class  of  integer-spaced  arrays.  Here,  we  present 
one  attempt  to  extend  this  approach  to  the  class  of 
noninteger-spaced  nonuniform  linear  arrays  (NLA’s). 
Since  such  arrays  generate  up  to  |  M(M  -  1)  distinct 
nonzero  covariance  lags,  they  have  the  potential  [8]  to 


estimate  a  superior  number  of  uncorrelated  Gaussian 
sources,  ie.  for  the  number  of  sources  in  the  range 

M  <m<  (1) 

For  a  known  number  of  sources  m,  we  previously  in¬ 
troduced  [6]  a  DOA  estimation  technique  capable  of 
handling  these  superior  scenarios.  The  current  prob¬ 
lem  of  detection-estimation  is  more  complicated  since 
we  now  require  both  an  estimation  of  the  number  of 
sources  and  their  DOA’s. 

Naturally,  this  problem  has  a  solution  if  and  only 
if  the  identifiability  conditions  hold,  which  in  this  case 
means  that  the  observed  set  of  covariance  lags  gen¬ 
erated  by  the  NLA  can  be  uniquely  decomposed  into 
some  number  of  signal  dyads  plus  white  noise.  While 
the  nonidentifiability  conditions  for  detection  are  given 
in  [4],  here  we  concentrate  on  identifiable  scenarios 
only;  that  is,  for  the  true  (deterministic)  covariance 
lags  and  the  chosen  virtual  array,  the  partially  specified 
covariance  matrix  has  a  unique  completion  that  corre¬ 
sponds  to  a  mixture  of  m  uncorrelated  plane  waves  in 
white  noise. 

In  practice,  when  the  observed  specified  covariance 
lags  are  stochastic,  being  produced  by  a  sample  M- 
variate  covariance  matrix,  the  feasibility  conditions  for 
our  type  of  positive-definite  (p.d.)  completion  are  not 
guaranteed.  Therefore,  in  order  to  achieve  a  p.d.  com¬ 
pletion  with  equalised  ( Ma  -  m)  minimum  eigenval¬ 
ues,  even  the  specified  (measured)  covariance  lags  need 
to  be  modified.  Clearly,  by  not  limiting  the  size  of 
the  modification  of  the  specified  lags,  we  can  achieve  a 
p.d.  completion  with  the  desired  number  ( Ma  —  p)  of 
minimum  eigenvalues  being  equal. 

Note  that  for  a  Hermitian  matrix  to  represent  a 
mixture  of  p  uncorrelated  plane  waves  in  noise,  the 
equality  of  the  ( Ma  —  p)  smallest  eigenvalues  is  only 
a  necessary  condition  (whereas  this  is  the  necessary 
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and  sufficient  condition  for  a  Toeplitz  matrix).  Thus 
some  further  modification  of  the  specified  covariance 
lags  is  required  in  order  to  correctly  model  the  sources, 
along  with  an  appropriate  completion  of  the  missing 
(unspecified)  covariance  lags. 

In  this  way,  we  finally  obtain  a  number  of  candidate 
models,  ie.  Mtt- variate  p.d.  Hermitian  matrices  of  the 
proper  structure,  that  are  now  compared  with  the  ML 
completion  discussed  below  using  traditional  informa¬ 
tion  criteria  that  judge  a  loss  in  likelihood  ratio  against 
an  overestimated  number  of  sources. 


2.  PROBLEM  FORMULATION 


Consider  m  narrow-band  plane-wave  signals  of  power 
p  =  \pi,  ...  ,pm]  impinging  upon  a  nonuniform  linear 
array  of  M  identical  omnidirectional  sensors  located 
at  positions  d  =  [d±  =  0,  di,  . . . , d\f]  measured  in  half- 
wavelength  units.  In  the  detection-estimation  problem, 
the  number  of  sources  m  is  unknown.  Adopting  the 
commonly  used  data  model  [12],  we  have 

y(t)  =  S{0)  x{t)  +  p{t)  for  t  =  l,...,N  (2) 

where 

x(t)  =  [®i(i),  . ..,arm(f)] T  (3). 

y(t)  =  ...,yM(t)]T  (4) 

v{t)  =  ...,VM(t)]T ,  (5) 

Xj(t)  ( j  =  1, ...,m)  is  the  complex  signal  amplitude 
of  the  jth  plane  wave,  and  where  j/*(t)  and  rjk(t)  ( k  = 
1, . . . ,  M)  are  the  sensor  output  and  the  noise  at  the 
kth  sensor  respectively.  To  permit  DO  A  estimation  in 
the  superior  case  (m  >  M),  we  restrict  ourselves  to 
the  class  of  independent  (Gaussian)  signal  amplitudes 
x(t)  €  Cmxl  such  that 


«{««.-(«} -{£-dh*w  £ 

(6) 

We  assume  that  the  additive  noise  rj(t)  €  CMx  1  is  white 
and  Gaussian: 

£{¥M-A<a)}  =  {S”7"  £  m 


The  array  manifold  matrix  is  S(9)  =  [s(#i),  . . . ,  a(0m)] 
G  CMxm,  where  each  constituent  “steering  vector”  s(0j) 
is  defined  as 


s(0j)  =  £l,  exp  (iirdiWj) ,  . . . ,  exp  (indMWj)  j  (8) 


with  to  =  sin  0  e  [—1, 1], 


According  to  this  model,  the  M-variate  spatial  co- 
variance  matrix 


R  =  SPSH  +p0IM  (9) 

is  p.d.  Hermitian.  Note  that  in  our  (“superior”)  case 
of  m  >  M,  the  noise-free  covariance  matrix  SPSH  is 
generally  of  full  rank.  Given  N  independent  samples 
(“snapshots”),  the  sufficient  statistic  for  DO  A  estima¬ 
tion  is  the  M-variate  direct  data  covariance  (DDC)  ma¬ 
trix 

*"(*)•  (10) 

v  t- 1 

To  illustrate  our  technique,  consider  the  “quasi¬ 
integer”  [6]  four-element  NLA 

d4  =  [0,  1.09,  3.96,  5.93]  (11) 

that  may  be  easily  recognised  as  a  slightly  perturbed 
version  of  the  optimal  four-element  integer  array  [10] 
d  =  [0,  1,  4,  6].  In  [6],  we  demonstrated  that  up  to  six 
independent  sources  could  be  unambiguously  identified 
by  the  NLA  d4.  The  co-array  of  d4  (the  sorted  set  of 
nonduplicated  position  differences)  is 

c4  =  [0,  1.09,  1.97,  2.87,  3.96,  4.84,  5.93]  (12) 

and  so  the  augmented  Ma  =  7-variate  Hermitian  co- 
variance  matrix  for  the  virtual  array  c4  is  extremely 
underspecified: 
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(13) 

Nevertheless,  it  is  important  to  understand  that  for  the 
true  covariance  lags  ro,  ... ,  rB.93,  identifiability  means 
that  there  exists  a  single  p.d.  completion  of  H  with 
equalised  ( Ma  —  p)  minimum  eigenvalues  for  any  sce¬ 
nario  with  m  <  6  independent  sources. 

Let  S  be  the  set  of  specified  elements  {p,q},  and  S 
be  the  set  of  unspecified  elements  in  the  initial  incom¬ 
plete  augmented  covariance  matrix  H.  Suppose  for  the 
moment  that  given  the  specified  sample  covariance  lags 
rs,  we  somehow  generate  a  set  of  candidate  p.d.  Ma- 
variate  Hermitian  matrices  (p  =  1,  ...,Ma  —  1) 
that  each  correspond  to  the  model  of  p  plane  waves 
in  noise.  To  select  the  best  candidate  model,  we  cal¬ 
culate  the  likelihood  ratio  (LR)  for  each  corresponding 
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M-variate  Hermitian  matrix 


R„  =  LHI1LT 


where  L  is  the  M  x  Ma  binary  selection  (or  incidence) 
matrix  with  Ljk  equal  to  unity  in  the  jth  row  and  djj.h 
column,  and  zero  otherwise. 

If  we  use  the  sphericity  test  [11] 


Ho  :  5|=PofAr 

Hi:  £{rp_5  RRp  5  |  7^  PqIm  , 


against 
Po  >  0  (15) 


then  the  LR  is 


7  (H„)  = 


det(R^R) 

-hTriR^R) 


Now  information  or  Bayesian  criteria  may  be  used  for 
model  selection,  such  as  the  minimum  description  length 

[7] 

3 

"W  =  arg  min  ,  -log7(TM)  +  -zfiiogN  . 

(17) 

Obviously  this  approach  is  optimal  only  if  Hp  is  the  ML 
estimate  of  the  p.d.  Hermitian  matrix  with  equal  (Ma  — 
fi)  minimum  eigenvalues.  Since  exact  ML  estimates 
of  this  kind  are  not  yet  available,  our  problem  is  to 
generate  a  set  of  above-described  Hermitian  matrices 
Hp  sufficiently  close  to  the  sufficient  statistic  R  in  the 
ML  sense. 


3.  “MAXIMUM-LIKELIHOOD” 
POSITIVE-DEFINITE  HERMITIAN 
COMPLETION 

In  [6],  we  introduced  several  p.d.  Hermitian  comple¬ 
tions,  including  maximum-entropy  (ME)  completion. 
These  completions  are  used  here  as  an  initialisation 
step  for  the  following  optimisation  routine.  Let  the 
general  virtual  array  d!  be  specified  by  the  virtual  sen¬ 
sor  positions  d'  (j  =  1, . . . ,  Ma),  then  the  set  of  all 
possible  p.d.  Hermitian  completions  H  may  be  written 
a sH  = 

{z:H(z)=H0+J2  {'ReHpgE™+ilmHpqEpq)>0} 

P,?€5 

P<9 

(18) 

where 

x=\  1  (19) 

2571  ^  WsIP<9 

E+  =  ePe  J  +  eqep  ,  E ™  =  epe %  -  eqep  (20) 


ep  =  [0, . . . ,  0, 1, 0, . . . ,  0]  is  the  M„-variate  basis  vector 
with  a  unit  entry  in  the  pth  position,  and  Ho  is  the 
initial  completion  (eg.  ME  completion). 

Suppose  we  label  each  of  the  missing  lags  (pq  € 
S;  p  <  q)  from  1  to  £,  the  total  number  of  missing 
lags.  For  nonredundant  NLA  geometries  such  as  d*, 
the  number  of  missing  lags  is  rather  large: 

i=\(v-l)(v-2),  (21) 

where  u  =  | M(M  —  1)  +  1.  Now,  instead  of  (18),  we 
may  write 

2  e 

H  =  {z:H(z)=H0  +  '£/zjFj>o}  (22) 


j=i 

where 

7^  rpq£S;  p<q 

for  j  =  !,...,£ 

J  l 

^pqeS;  p<q 

for  j  =  1+1,.. 

(  E™  for 

\  iEpJ  for 

j  =  1,...,£ 
j  =  l  +  l,...,2i 

(23) 
.  (24) 


For  sufficiently  small  z*  in  (22) 

\zk\  <  e  for  k  =  1,  (25) 

we  may  treat  the  term  Y^k- i  zk^k  as  being  equal  to 
a  perturbation  matrix  SH(z0),  and  find  a  first-order 
expansion  for  the  eigenvalues  [9].  According  to  the 
sphericity  LR  (16),  the  problem  of  ML  maximisation 
is  associated  with  the  problem  of  eigenvalue  equalisa¬ 
tion  in  the  matrix 

G(z)  =  R~*[LH(z)Lh]R-s  .  (26) 

By  applying  a  first-order  expansion  to  the  eigenvalues 
of  G(z): 

2 1 

G(z)  =  G0  +  Y,ZkR~h  LFkLH  R~*  (27) 

k= i 

we  cam  derive  that 


A,[G(z)]  =  A,[(7o]+X;  zk  ufHR~*  LFk  LH  R~ *  u<°> 


where  (g  =  1,  . . . ,  M)  is  the  gth  eigenvector  of  the 
matrix  Go,  with  corresponding  eigenvalue  Aff[Go].  Now 
we  can  introduce  the  (M  x  2£)  matrix 

=  (ufffrUFtLflr^ff=1 . M  (29) 

v  )  1 , 2 1 
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and  our  search  to  find  sufficiently  small  perturbations 
(\zk\  <  e)  that  minimise  the  difference  between  the 
( Ma  —  (i)  smallest  eigenvalues  of  G(z)  may  then  be 
formulated  as  the  following  linear  programming  (LP) 
problem: 

Find  min  (a  —  (3)  subject  to  (30) 
A(0)  +  X>(0)  z  <  a  1 ,  a  >  0  (31) 

A(0)  +  X>(0)  z>P  1 ,  0  >  0  (32) 

— e  <  Zk  <  £  for  k  —  1,  21  (33) 

where  A^  is  the  vector  of  noise-subspace  eigenvalues: 

A(0)  =  [A£_M+1,...,AgjT.  (34) 

and  1  =  [1,  . . . ,  1]T.  Let  the  solution  of  this  LP  prob¬ 
lem  be  z*-0),  then  we  define  an  updated  Hermitian  ma¬ 
trix 

jfW  =  H <°>  +  Y,  40)  Fk  (35) 

k= 1 

and  so  by  direct  decomposition 

=  R-s  LHW  Lh  R-s  (36) 

we  may  check  the  validity  of  the  constraints  (33),  and 
decrease  the  perturbation  “step  size”  e  if  our  equal¬ 
isation  step  failed  to  improve  the  current  differences 
amongst  the  noise-subspace  eigenvalues  of  the  matrix 
If  the  validity  conditions  are  met,  then  we  com¬ 
pute  the  associated  it^  and  A^  and  then  solve  the 
iterated  LP  problem.  Suppose  that  k  iterations  are  re¬ 
quired  before  this  procedure  essentially  reaches  its  final 
stable  point. 

Naturally,  the  global  optimality  of  the  overall  proce¬ 
dure  cannot  be  guaranteed,  whereas  at  each  step  ( ie .  lo¬ 
cally),  the  LP  routine  provides  the  optimal  solution. 

Note  that  during  this  first  stage  of  our  routine,  only 
the  unspecified  (missing)  elements  of  have  been 
varied,  while  the  specified  sample  covariance  lags  re¬ 
main  the  same  as  for  the  initial  point  Hq. 

Now,  during  the  second  stage  of  the  ML  maximi¬ 
sation  routine,  we  modify  all  covariance  lags.  Since 
small  perturbations  in  the  sample  covariance  lags  of 
R  (with  respect  to  the  exact  values  in  R)  lead  to  sig¬ 
nificant  fluctuations  in  the  noise-subspace  eigenvalues 
<rn  of  the  matrix  H,  “inverse  perturbations”  in  H  that 
equalise  up  to  the  ( Ma—m )  smallest  eigenvalues  should 
not  involve  significant  changes  to  the  sample  covariance 
lags.  Effectively,  we  use  the  same  optimisation  routine 
(30)  here  with  the  only  significant  difference  that  now 
all  elements  (except  the  diagonals)  are  varied,  ie. 

M(M-l) 

2?(«+1)  _  h{k)  +  Y'  40)  pk  •  (37) 

fc=i 


Given  that  we  cannot  guarantee  the  global  optimality 
of  this  second  optimisation  routine  also,  we  may  treat 
the  solution  ( H W,  say)  as  the  unstructured  ML  esti¬ 
mate  of  the  Ma-variate  covariance  matrix  .  There¬ 
fore  the  probability  of  obtaining  the  desired  number  of 
identical  minimum  eigenvalues  in  H ^  is  zero. 

For  this  reason,  our  third  stage  involves  obtaining 
a  properly  structured  ML  estimate  that  corresponds  to 
a  mixture  of  (J.  independent  plane  waves  in  noise.  The 
unstructured  ML  estimate  H ^  is  used  as  a  sufficient 
statistic,  and  further  modification  of  the  unspecified 
entries  occurs  in  order  to  equalise  the  (Ma—fi)  smallest 
eigenvalues  in  this  matrix.  Obviously,  we  expect  the 
more  eigenvalues  that  are  to  be  equalised,  the  more 
losses  we  will  obtain  in  the  LR  compared  with  the  ML 
estimate  HAfL . 

Similarly  to  the  above,  we  may  present  this  equali¬ 
sation  routine  as 

2 1 

HU+i)  =  HU)  +  YzkFk,  ff J°>  =  (38) 

*=i 

where  is  the  p.d.  Hermitian  matrix  obtained  at  the 
jth  iteration  of  the  equalisation  routine.  As  before,  by 
applying  a  first-order  perturbation  expansion  for  the 
eigenvalues  of  the  matrix  H^+1\  we  can  derive  the 


following  LP  problem: 

Find  min  (a  -  0) 

subject  to 

(39) 

aU)  _|_  yC?)  z  <  a  1 , 

a  >  0 

(40) 

a-U)  +  y(j)  z  >  p  l, 

/3>0 

(41) 

—e<  zk  <  e  for  k 

CM 

i-H 

II 

(42) 

where 

1  if 

•li 

£ 

II 

•**> 

> 

-M+1 . Afo 

.,21 

(43) 

is  the  ith  eigenvector  of  the  matrix  h}P  ,  with  as¬ 
sociated  eigenvalue  ,  and  <r^  is  the  vector  of  noise- 
subspace  eigenvalues.  Step  size  control  of  e  is  imple¬ 
mented  in  the  same  fashion  as  before  (33). 

Clearly,  the  stable  point  of  this  third  stage  ( H , 
say)  would  not  result  in  exactly  equal  noise-subspace 
eigenvalues,  since  (as  in  the  first  stage)  the  specified 
entries  have  not  been  modified.  Of  course,  it  is  pos¬ 
sible  to  use  a  transformation  to  reach  this  final  goal. 
Such  a  transformation  keeps  the  eigenvectors  of 
invariant,  and  so  the  MUSIC-derived  DOA  estimates 
for  \i  sources  also  remain  the  same.  However,  due  to 
the  dimension  reduction  brought  about  by  (14),  the 
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LR  (16)  would  change  as  a  result  of  such  a  transfor¬ 
mation.  Moreover,  even  with  strictly  equalised  eigen¬ 
values,  the  Hermitian  matrix  H 'jfi  does  not  necessar¬ 
ily  corresponding  to  the  desired  plane-waves-plus-noise 
model. 

Thus  our  fourth  and  final  stage,  that  considers  the 
sequence  of  “ML”  hypotheses  H (p  =  1,  . . . ,  Ma  - 1), 
consists  of  a  local  ML  refinement  of  the  p  DOA  esti¬ 
mates  and  associated  signal  powers  in  the  vicinity  of 
the  MUSIC  DOA  estimates  generated  by  the  covari¬ 
ance  matrix  h\?\  This  local  refinement  procedure 
is  introduced  in  [1],  and  involves  the  specified  covari¬ 
ance  lags  only.  As  a  result,  for  each  candidate  model 
\i  —  1,  1,  we  can  find  the  “ML”  set  of  es¬ 

timated  signal  parameters  { 9 ^  ,pjf}  and  estimated 
white  noise  power 

K  =  <44) 

3= 1 

that  uniquely  describes  the  covariance  matrix  R M  in  the 
hypothesis  (15) 

R,=P^lM  +  j2p^S(9^)SH(e^).  (45) 

3=1 

Obviously,  which  ever  information  theoretic  or  Bayesian 
criterion  is  used  for  hypothesis  selection,  such  a  selec¬ 
tion  uniquely  specifies  not  only  the  number  of  sources, 
but  also  the  DOA  and  power  estimates. 

4.  FINAL  COMMENTS 

Simulation  results  (not  introduced  here)  conducted  for 
the  NLA  di  for  a  superior  number  of  sources  demon¬ 
strates  that  the  detection  performance  achieved  by  the 
four-stage  algorithm  described  in  this  paper  is  com¬ 
parable  to  that  produced  by  the  standard  AIC  and 
MDL  criteria  for  conventional  scenarios  (with  m  <  M 
sources)  with  the  same  Cramer-Rao  bound.  Natu¬ 
rally,  in  order  to  compare  detection  performance  on 
conventional  and  superior  scenarios,  it  is  necessary  to 
introduce  significantly  different  intersource  separation 
and/or  sample  sizes,  however  the  comparable  detec¬ 
tion  performance  in  the  two  cases  suggests  that  the 
new  detection  scheme  described  here  is  close  to  opti¬ 
mum.  An  additional  justification  for  this  conclusion 
is  that  when  our  detection-estimation  algorithm  yields 
the  true  number  of  superior  sources,  we  obtain  a  DOA 
estimation  accuracy  close  to  the  corresponding  Cramer- 
Rao  bound. 
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ABSTRACT 

In  this  paper,  we  introduce  the  effective  uses  of  Gerschgorin  radii 
[1-2]  of  the  unitary  transformed  covariance  matrix  for  source 
number  detection.  The  heuristic  approach  applying  a  new 
Gerschgorin  radii  set  developed  from  the  projection  concept, 
overcomes  the  problem  in  cases  of  small  data  samples  and  an 
unknown  noise  model.  The  proposed  method  is  based  on  the 
sample  correlation  coefficient  to  normalize  the  signal 
Gerschgorin  radii  for  source  number  detection.  The  performance 
of  the  proposed  method  shows  improved  detection  capabilities 
over  GDE  [1,2]  in  Gaussian  white  noise  process. 

1.  INTRODUCTION 

Array  processing,  or  more  accurately,  sensor  array  processing,  is 
the  processing  of  the  output  signals  of  an  array  of  sensors  located 
at  different  points  in  space  in  a  wavefield.  The  purpose  of  array 
processing  is  to  extract  useful  information  from  the  received 
signals  such  as  the  number  and  location  of  the  signal  sources,  the 
propagation  velocity  of  waves,  as  well  as  the  spectral  properties 
of  the  signals.  Array  processing  techniques  have  been  employed 
in  various  areas  in  which  very  different  wave  phenomena  occur. 
Common  to  all  these  applications,  there  are,  in  general,  two 
essential  purposes  in  array  processing:  (i)To  determine  the 
number  of  sources  (decision),  (ii)To  estimate  the  locations  of 
these  sources  (estimation). 

Several  high  resolution  detectors[3-5]  for  direction  of  arrival 
(DOA)  have  been  developed  in  the  field  of  passive  underwater 
and  radar  signal  processing  in  recent  years.  The  primary 
contributions  to  the  field  include  the  MUSIC  method  proposed 
by  Schmidt  [3],  the  Minimum-Norm  method  by  Kumaresan  and 
Tufts  [4],  and  the  ESPRIT  method  by  Roy  et  al.  [5].  It  is  well 
known  that  the  performances  of  these  high  resolution  methods 
largely  depend  on  the  successful  determination  of  the  number  of 
sources.  Thus,  several  methods  [6-11]  have  been  suggested  with 
this  purpose  in  mind.  Wax  and  Kailath  [6]  bring  a  statistical 
approach  to  solve  the  problem  of  source  number  detection  based 
on  the  AIC  and  the  MDL  methods,  which  are  generally  used  for 
the  model  selection. 

In  general,  the  AIC  and  MDL,  including  their  modified  versions, 
remain  the  most  widely-used  methods  for  estimating  the  source 
number.  Most  of  them  use  the  eigenvalues  to  estimate  source 
number  but  neglect  to  use  the  eigenvectors  as  well. 
Consequently,  Wu  and  Yang  [1]  proposed  a  heuristic  approach 
by  applying  the  Gerschgorin  theorem  to  find  Gerschgorin  radii  of 
the  transformed  covariance  matrix  for  source  number  detection. 


The  heuristic  detection  criterion  is  developed  from  the  concept  of 
eigenvectors'  projection. 

In  this  paper,  a  proper  similar  transformation  of  the  covariance 
matrix  is  required  in  order  to  effectively  utilize  the  sample 
correlation  coefficient  to  normalize  the  signal  Gerschgorin  radii 
for  source  number  detection. 

2.  GERSCHGORIN  DISK  METHOD  FOR 
SOURCE  NUMBER  DETECTION 

2.1  NarrowBandModel 

We  first  review  the  narrow  band  mathematical  model  for 
estimating  the  number  of  sources  and  DOA  of  signals  in  a 
spatially  white  noise  environment.  The  model  we  consider  here 
consists  of  L-dimensional  complex  data  vector  x(k)  which 
represents  the  data  received  by  an  array  of  L  sensors  at  the  kth 
snapshot.  The  data  vector  is  composed  of  plane-wave  incident 
narrowband  signals  each  of  angular  frequency  io0  from  M  distinct 

sources  embedded  in  Gaussian  noise.  Thus,  the  measured  array 
data  vector,  x(k),  which  is  assumed  to  be  composed  of  M 
incoherent  directional  sources  corrupted  by  additive  white  noise, 
is  received  at  the  kth  snapshot  by  L  (L  >  M)  sensors  and  is  given 
by: 

M 

x(k)  =  X  si(k)S((0i)  +  B(k)  =  A(eo)s(k)  +  n(k),  (1) 

i=l 

where  A(u>)=[  afcOj)  a(o>2)  ...  a(wL)  ]  is  the  direction  matrix 

composed  of  direction  vectors  (steering  vector)  of  the  signals  and 
the  noise  vector  n(k),  which  is  assumed  to  be  complex,  zero-mean, 

T 

and  Gaussian.  The  source  vector  is  s(k)=[sI(k),  s2(k) . sM(k)]  , 

where  sm(k)  is  the  amplitude  of  the  mth  source  and  is  assumed  to 

be  jointly  circular  Gaussian  and  independent  of  n(k).  The  exact 
form  of  the  steering  vector  depends  on  the  array  configuration. 
However,  the  uniform  linear  array,  apart  from  being  most 
commonly  used,  may  also  offer  advantageous  implementation 
efficiency  of  some  algorithms.  For  a  propagation  wavelength  T|, 
the  distance  between  two  sensors  in  a  uniform  linear  array  must  be 
D— 13/2  and  the  corresponding  steering  matrix  is  given  by 
:a(u>m)  =  [  1,  exp(j©m ),  ...,  exp(j(L-l)com)  ]T,  (2) 

where  com  is  given  by  :  tom=  2nD s  i  n©m/ri,  where  D  is  the 
spacing  between  adjacent  elements.  8m  is  the  impinging  angle  of 
the  m^1  source  relative  to  the  array  broadside  where  0m  £ 
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(~ ”,  ~)  for  all  m.  The  vectors  a(com),  m=l,2 M 

corresponding  to  M  different  values  of  9m  are  assumed  to  be 
linearly  independent.  This  implies  that  L>M,  and  rank(A)=M. 

Note  that  it  follows  that  x(k)  is  a  complex  Gaussian  vector  with 
zero  mean  and  covariance  matrix  given  by 

C  =  E[  x(k)  x(k)H  ]  =  A(«>)  C  AH(to)  +  0*1,  (3) 

where  C  ,  which  is  the  covariance  matrix  of  s(k),  is  assumed  to  be 

non-singular,  and  on2  is  the  variance  of  Gaussian  noise. 

Superscripts  *,  T,  and  H  denote  conjugate,  transpose,  and 
Hermitian  transpose  of  matrices,  respectively. 

If  N  observations  have  been  measured  from  L  sensors,  the  entire 
data  set  can  be  placed  in  a  L*N  matrix  x  as: 


=[*(l),x(2),...  i(k)  •••  A(N)]l>n  .  (4) 


D|- diag(X.  |  X2  )•  U) 

The  eigenvalues  X,  >  X2  >  >  XM  >  XM+|  —  -  ^L-iare 

shown  in  descending  order..  Since  X  t  in  Eq.(4)  are  the 
eigenvalues  of  the  leading  principal  submatrix  of  C,  their 
eigenvalues  satisfy  the  interlacing  property  shown  as  :X,  >  X ,  > 

^2  — -  ^M+l  -  ^M+l  ^L-l  -  ^L-l  -  ^L'  ^Tie 

transformed  covariance  matrix  becomes  : 

~  X\  0  0  0  0  0  S\ 

0  Xi  0  0  0  0  Si 

S=UHCU=  0  0  0  \M  0  0  0  ,  (8) 

ooooo 

_S\  Si  ■  Su  ■  Sl-\  cll. 

where 

Si  =  a’iH£ .  (9) 

for  i=l,  2, L-l. 


Each  row  of  x  represents  a  multivariate  observation.  For  the  L- 
dimensional  scatterplot,  the  row  of  x  represent  N  points  in  L- 
dimensional  space.  Subsequently,  the  array  sampled  covariance 
matrix  in  Eq.(3)  can  also  be  expressed  as  : 

C  =  — XX"  (5) 

-  N - 

2.2  Gerschgorin  Disk  Estimator 

To  make  the  Gerschgorin  disk  theorem  effective,  Wu  et  al.  [2] 
proposed  a  proper  transformation,  called  Gerschgorin  Disk 
Estimator  (GDE)  for  source  number  detection.  The  covariance 
matrix  is  first  partitioned  as  : 


where  C,  is  an  (L-l)x(L-l)  leading  principal  submatrix  of  C, 
which  is  obtained  by  deleting  the  last  row  and  column  of  C. 
Physically,  it  can  be  regarded  as  the  removal  of  the  L,h  sensor. 
Thus,  C,  becomes  the  reduced  covariance  matrix  of  the  remaining 
(L-l)  sensors.  The  reduced  covariance  matrix  C,  also  can  be 
decomposed  by  its  eigenstructure  as  :  C|=U|D]U]H,  where  is 
an  (L-l)x(L-l)  unitary  matrix  formed  by  the  eigenvectors  of  C(  as  : 


II  I  I 

Hi  =  [fl,  fl2  3m  ■“  3l  ,],and  D,  is  the  diagonal  matrix 
constructed  from  the  corresponding  eigenvalues  as  : 


It  is  clear  that  the  first  (L-l)  Gerschgorin  disks  (  i.e.  Oj,  Oz,  ..., 
On)  possess  the  Gerschgorin  radii : 

rj  =  |Pil  =  lfliHc|>  0°) 

for  i  =  1,  2,  ...,  L-l.  It  is  necessary  to  verify  that  all  of  the  p, 
values  are  equal  to  zero  when  i=(M+l),  (M+2),...,  (L-l)  due  to  the 
fact  that  the  noise  eigenvectors,  s'j,  are  orthogonal  to  A,,  which  is 
the  direction  matrix  of  C,. 

Since  S  is  a  unitary  transformation  matrix  of  C,  they  will  share  the 
same  eigenvalues.  The  collection  of  the  first  (L-l)  Gerschgorin 
disks,  Oj,  contains  its  Gerschgorin  center  at  cj  =  Xj  and  the 
corresponding  Gerschgorin  radius  rt  =  |pj|,  i  =  1,2,  ...,(L-1).  The 
disks  with  zero  radii  ( i.e.  Om+],  Om+2,'",  Ol_,  )  are  regarded  as 
the  collection  of  noise  Gerschgorin  disks.  The  remaining  disks 
(  i.e.  Op  02,  — ,  Om)  containing  non-zero  radii  and  large  centers 
are  considered  to  be  the  source  Gerschgorin  disks.  Hence,  we  can 
determine  the  number  of  sources  by  counting  the  number  of  non¬ 
zero  Gerschgorin  radii  in  the  case  of  infinite  samples.  In  addition, 
we  can  also  use  (L-l)  eigenvalues  of  Cj  to  determine  the  number 
of  sources. 

It  can  be  seen  that  the  threshold  must  be  adjustable  to  varying 
numbers  of  snapshots.  Hence,  we  define  a  heuristic  decision  rule 

as[2]:  GDE(k)  =  rk-^?  £  <).  (>D 

i=l 

Where  k  is  an  integer  in  the  closed  interval  [1,  L-2].  The 
adjustable  factor,  D(N),  could  be  a  non-increasing  function 
(between  I  and  0)  when  N  increases.  If  GDE(k)  is  evaluated 
from  k=l,  the  number  of  sources  is  determined  as  k-1  (i.e.  M=k-1) 
when  the  first  nonpositive  value  of  GDE(k)  is  reached.  This  is 


due  to  the  fact  that  the  radius  value  below  the  adjustable 
threshold  will  be  considered  the  noise  collection.  Thus,  the 
above  GDE  rule  may  produce  problems  of  underestimation 


3.  A  NEW  GERSCHGORIN  RADII  BASE 
METHOD 

Hence,  the  method  capable  of  reducing  the  radii  size  of  signal 
Gerschgorin  disks  should  help  resolve  source  number  detection 
problem. 


3.1  Correlation  Coefficients  of  Samples  Space 


In  light  of  these  requirements,  an  effective  source  number 
detection  method  must  select  a  proper  transformation  for 
maximum  reduction  of  the  radii  size  of  signal  Gerkschgorin  disks 
and  make  noise  Gerschgorin  disks  as  remote  as  possible  from 
signal  Gerschgorin  disks.  Therefore,  a  nonsingular  matrix,  D  = 

diag(X  |  X2  —  X M  —  XL  i  1)  was  used  in  [2]  to  get  small  signal 


Gerschgorin  radii,  such  as 


i=l,2,..,M.  That  method 


led  to  development  a  novel  technique,  which  outperformed  GDE 
in  Gaussian  white  and  nonwhite  noise  processes  and  could  be 
used  successfully  even  when  SNR  is  near  0  dB.  In  this  paper,  we 
extend  the  function  of  reducing  signal  Gerschgorin  disks  by 
using  a  new  developed  similar  transformation  of  the  sampled 
covariance  matrix  and  its  new  set  of  normalized  radii  of  signal 
Gerschgorin  disks. 

As  Eq.(4),  If  N  observations  have  been  measured  from  L  sensors, 
the  entire  data  set  can  be  also  placed  in  a  L*N  matrix  x  as: 


x  =[x(l),  x(2),...  5(L-1),x(L)]l-n  .  According  to  the  definition  of 
the  multiple  linear  regression  [12],  the  maximum  correlation 
coefficient  is  define  as 


(12) 


for  /'=  1,  2, ...  ,  L  andf  =  1, 2,  ...  ,  L.  Note/7/:<i  =  pkj  for  all  /'and 
k.  The  value  of  pik  must  be  between  0  and  +1 . 

Without  altering  the  true  eigenvalues,  a  proper  transformation  of 
the  covariance  matrix  is  required  in  order  to  effectively  utilize  the 
sample  correlation  coefficient  to  normalize  the  signal  Gerschgorin 
radii  for  source  number  detection. 


3.2  The  Proposed  Method 

In  this  section,  a  new  transformation  kernel  based  on  the  concept 
of  sample  correlation  coefficient  is  proposed  in  order  to  improve 
detection  performance.  Now,  a  novel  transforming  matrix  is 
proposed: 

fi=diag(VcTr  tMi  . . 

=diag(q>i  TV  •HV  -  HJ/-i,l),  (13) 

to  the  transformed  matrix  in  Eq.(14),  where  are  the 
eigenvalues  of  the  first  (L-l)x(L-l)  leading  principal  submatrix 
of  C. 

The  new  transformed  true  covariance  matrix  becomes: 


S'=D-1UhCUD=D-|I 


UlHC)Ui  HiHc 

£%  CLL 


\ 

0 

r 


(14) 


According  to  the  Gerschgorin  disk  theorem,  it  is  clear  that  the 
first  (L-l)  Gerschgorin  disks  ( i.e.  Oj,  02, ...,  Ol_i)  contain  the 
new  Gerschgorin  radii : 


for  i  =  1,2, ...,  M.  Since  r  j  in  Eq.(15)  can  be  considered  as  the 
correlation  coefficient  of  the  covariance  matrix  in  Eq.(  1 2).  The 
values  of  r’j  are  all  less  than  1,  so  that  Y/  >  rj  .  In  other  words, 

the  disk  size  of  signal  Gerschgorin  disks  can  be  reduced  as  small 
as  possible  and  the  noise  Gerschgorin  disks  can  be  kept  as  remote 
from  the  signal  Gerschgorin  disks  as  possible.  Therefore,  the 
source  number  can  be  easily  determined  by  visually  counting  the 
number  of  signal  Gerschgorin  disks  derived  by  Eq.(  1 4). 
Moreover,  when  the  noise  statistics  can  not  be  accurately 
estimated,  the  GDE  method  fails  under  a  low  SNR  situation; 
whereas  the  proposed  method  may  not. 

For  example,  in  the  case  of  one  simulated  covariance  matrix,  the 
sensor  number  is  6  (i.e.  L=6)  and  two  sources  (i.e.  M=2)  are 
uncorrelated  and  impinged  from  -12°  and  10°  (i.e.  DOA=[-12° 
10°  ]).  The  signal-to-noise  ratios  are  both  2  dB  (i.e.  SNR=[10 
10]  dB)  and  the  number  of  samples  chosen  is  N=100.  Its 
Gerschgorin  disks  in  terms  of  Gerschgorin  center-and-radius 
pairs  become  {12.11,  0.42},  {7.93,  4.7]},  {0.19,  0.18}, 
{0.09,0.36},  and  {0.08,0.03}.  The  results  are  illustrated  in  Figure 
1(a).  Subsequently,  the  same  covariance  matrix  is  transformed  by 
the  suggested  unitary  transformation  as  shown  in  Eq.(14).  The 
results  are  illustrated  in  Figure  1(b).  It  is  now  significant  that  the 
Gerschgorin  disks  form  two  separate  collections.  The  source 
collection  contains  disks  Oj  and  02  with  small  radii  (less  than  1) 
and  the  noise  collection  O3  n  O4  Pi  O5  with  small  radii. 


Fig.l(a)(b).Gerschgorin  disks  of  the  estimated  covariance 
matrix 
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4.  SIMULATION  RESULTS 


6.  REFERENCES 


A  uniformly  linear  array  of  8  isotropic  sensors  is  spaced  a  half 
wavelength  apart  with  additive  and  uncorrelated  white  noise.  The 
VGD  and  GDE  methods  are  used  to  detect  two  uncorrelated 

sources  with  SNR’s  of  6dB  impinging  from  0  0  and  5°respectively. 
After  200  Monte  Carlo  runs,  we  compute  their  relative  frequency 
of  false  detection  using  various  numbers  of  snapshots,  Error 
detection  performance  in  terms  of  probabilities  is  depicted  in 
Figure  2.  It  can  be  seen  that  the  proposed  method  outperforms 
GDE. 


Fig.2  Detection  performance  of  the  AIC,  MDL,  GDE,  and  the 
proposed  method  in  uses  of  simulated  data  with  Gaussian  white 

noise.(SNR=[6  6]dB,  DOA=fO°  5°]) 


5.  CONCLUSION 

In  this  paper,  GDE  performance  is  improved  by  using  a 
developed  similar  transformation  of  the  covariance  matrix 
and  using  its  new  set  of  Gerschgorin  radii  to  design  the 
source  number  estimators.  The  proposed  method  is  based 
on  the  sample  correlation  coefficient  to  normalize  the 
signal  Gerschgorin  radii  for  source  number  detection.  The 
performance  of  the  proposed  method  shows  detection 
capabilities  superior  to  GDE  in  Gaussian  white  noise 
process  and  can  be  used  successfully  in  a  situation  of 
measured  experimental  data. 
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ABSTRACT 

This  paper  presents  further  extensions  to  the  multita¬ 
per  time-frequency  spectrum  estimation  method  devel¬ 
oped  by  the  author.  The  method  uses  time-frequency 
(TF)  concentrated  basis  functions  which  diagonalize 
the  nonstationary  spectrum  generating  operator  over 
a  finite  region  of  the  TF  plane.  Individual  spectro¬ 
grams  computed  with  these  eigenfunctions  form  direct 
TF  spectrum  estimates,  and  are  combined  to  form  the 
multitaper  TF  spectrum  estimate.  A  method  is  pre¬ 
sented  for  adapting  the  multitaper  spectrogram  to  lo¬ 
cally  match  frequency  modulation  in  the  signal,  which 
can  cause  broadening  of  the  spectral  estimate.  An  F- 
test  for  detecting  and  removing  frequency-modulated 
tones  is  also  given. 

1.  INTRODUCTION 

Thomson’s  multitaper  spectral  estimation  approach  [1] 
is  a  powerful  method  for  nonparametric  spectral  esti¬ 
mation.  This  method  uses  a  set  of  orthogonal  data  ta¬ 
pers  that  are  maximally  concentrated  in  frequency  and 
diagonalize  the  spectral  generating  operator.  These 
tapers  are  used  to  approximately  invert  the  operator 
and  estimate  the  spectrum.  The  multitaper  approach 
was  first  applied  to  time-frequency  (TF)  analysis  by 
a  direct  extension  to  the  nonstationary  case  through  a 
sliding-window  framework  [2],  in  which  spectrograms 
are  computed  with  each  of  the  tapers  and  combined 
to  form  an  estimate  of  the  TF  spectrum.  A  multita¬ 
per  TF  spectrum  was  constructed  using  spectrograms 
computed  with  Hermite  windows  [3],  which  had  pre¬ 
viously  been  shown  to  maximize  a  TF  concentration 
measure  [4].  This  method  was  extended  to  include  a 
means  of  reducing  artifacts  using  a  TF  mask  [5].  More 
recently,  a  multitaper  method  for  TF  analysis  was  pre- 

This  work  was  supported  by  the  National  Science  Foundation 
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sented  by  this  author  [6]  that  diagonalized  the  nonsta¬ 
tionary  spectral  generating  operator,  formally  extend¬ 
ing  Thomson’s  approach  to  TF.  Subsequent  work  by 
the  author  gave  bias  and  variance  measures  for  the  es¬ 
timated  TF  spectrum,  presented  an  adaptive  procedure 
to  reduce  the  bias  of  the  individual  spectrograms,  and 
derived  other  properties  of  the  eigenfunctions  and  the 
resulting  TF  spectral  estimate  [7,  8]. 

In  this  paper,  a  method  is  presented  for  adapting 
the  multitaper  spectrogram  to  locally  match  frequency 
modulation  in  the  signal,  which  can  cause  broadening 
of  the  spectral  estimate.  Frequency  modulation  (FM) 
in  the  signal  will  degrade  the  resolution  and  accuracy 
of  the  multitaper  spectrogram  due  to  well-known  spec¬ 
tral  broadening  effects.  One  common  way  of  alleviat¬ 
ing  the  effects  of  the  spectral  broadening  is  to  match 
the  spectrogram  to  the  FM  by  frequency-modulating 
the  window.  This  approach  works  perfectly  well  when 
there  is  only  one  FM  rate  in  the  signal,  as  is  the  case 
with  chirped  sonar  and  radar.  However,  in  multicom¬ 
ponent  signals  such  as  speech,  biological,  and  mechan¬ 
ical  signals,  there  can  be  multiple  FM  rates  present 
at  any  given  time.  To  accurately  analyze  these  types 
of  signals,  it  is  necessary  to  locally  adapt  the  multi¬ 
taper  spectrogram  to  the  FM  at  a  given  TF  region. 
This  paper  presents  a  method  for  performing  this  lo¬ 
cal  adaptation.  An  F-test  for  detecting  and  removing 
frequency-modulated  tones  is  also  given. 


2.  BACKGROUND:  MULTITAPER 
TIME-FREQUENCY  SPECTROGRAMS 

This  approach  to  TF  spectral  estimation  is  based  on  a 
straightforward  extension  of  the  spectral  representation 
theorem  for  stationary  processes  [9],  and  is  equivalent 
to  a  linear  time-varying  (LTV)  filter  model.  Define  the 
signal  s(t)  as  the  output  of  a  white-noise-driven  LTV 
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filter.  The  signal  can  then  be  written  as: 

s(t)  =  J  H(t,u)ejwtdZ(oj),  (1) 

where  H(t,ui)  is  defined  as  the  Fourier  transform  of  the 
LTV  filter  h(t,t  -  r)  [10].  The  TF  spectrum  is  defined 
by: 

P(t,w)  =  \H(t,w)\2.  (2) 

This  formulation  for  a  TF  spectrum  is  of  the  same  gen¬ 
eral  form  as  Priestley’s  evolutionary  spectrum  [9];  how¬ 
ever,  H (t,  oj)  is  not  constrained  to  be  slowly-varying. 

Given  a  signal  s(t),  an  estimate  P(t,u>)  is  desired; 
however,  direct  inversion  of  equation  (1)  is  impossible. 
A  rough  estimate  of  the  time-varying  frequency  con¬ 
tent  of  s(t)  may  be  obtained  by  computing  its  short- 
time  Fourier  transform  (STFT): 


Ss(t,w)  =  j  s(r)g(t  —  r)e  3UTdT , 


(3) 


where  g(t)  is  a  rectangular  window  of  length  T.  A 
relationship  between  the  STFT  and  H  ( t ,  u>)  is  obtained 
by  replacing  s(t)  by  its  TF  spectral  formulation: 

S,(t,w)=  [  J  H(T,6)g(t-T)e-^-VrdZ(0)dT. 

J  1  (4) 

To  solve  for  the  time-varying  spectrum  H(t,0),  the 
STFT  operator  g(t  -  r)e~jW  must,  be  inverted.  This 
inversion  is  an  inherently  ill-posed  problem.  Instead, 
the  inverse  solution  is  approximated  by  regularizing 
it  to  some  region  R(t,u)  in  the  TF  plane,  much  as 
Thomson  regularized  the  spectral  inversion  to  a  band¬ 
width  W  in  his  multitaper  approach  [1].  For  simplicity 
throughout,  R(t,u)  is  defined  to  be  a  square  TF  region 
of  dimension  AT  x  A IV;  however,  the  results  readily 
generalize  to  arbitrary  regions. 

In  the  case  of  spectral  estimation,  the  operator  is 
square  and  Toeplitz;  its  regularized  inverse  is  found 
through  an  eigenvector  decomposition.  Such  is  not  the 
case  in  the  TF  problem;  the  STFT  operator  is  neither 
full  rank  nor  square.  This  operator  is  diagonalized  us¬ 
ing  a  Singular  Value  Decomposition,  giving  left  and 
right  eigenvectors  u(r)  and  V (t,  to)  and  the  associated 
eigen  (singular)  values  A: 

9(t  -  r)e~iWT  =  £  \kuk(r)Vk*(t,cj).  (5) 

k 

The  eigenvectors  it(r)  and  V(t,u)  form  an  STFT  pair: 

V(t,u>)  =  J  u(r)g(t  -  T)e~iu}TdT.  (6) 


The  SVD  relationship  between  u(t)  and  V (t,  u>)  is  ob¬ 
tained  by  applying  the  STFT  operator  to  V ( t ,  u>),  com¬ 
puting  the  integrals  only  over  AT  x  A IV: 


Xu(t)  =  f  j 

J  at  Jaw 


V(t,u)g{t-T)eju)Tdudt.  (7) 


The  inverse  STFT  computed  over  all  (t,  to)  also  holds. 
This  equation  can  be  reduced  to  a  standard  eigenvector 
equation  by  substituting  for  V(t,  w).  The  eigenvalue 
equation  for  u(r)  is  then: 

A u(t)  =  J  2AIVsinc(AIV(r  -  s))/(r,  s)u(s)ds,  (8) 


where 


f(r,  s)  =  [  g(t  -  s)g{t  -  r)dt. 
Jat 


(9) 


u(t)  can  be  computed  using  standard  eigenvalue  so¬ 
lution  methods.  As  has  been  discussed  elsewhere,  the 
eigenvectors  are  concentrated  in  TF  and  doubly  orthog¬ 
onal,  both  over  the  entire  TF  plane  and  over  AT  x  AIV. 
These  properties  are  critical  for  the  estimation  method. 

Next,  H(t,w)  is  estimated  regularized  to  the  rectan¬ 
gular  region  AT  x  AIV  by  projecting  it  onto  AT  x  AIV 
in  the  vicinity  of  (t,ui)  using  the  kth  left  eigenvector 
uk{t): 

Hk(t,u)  =  A"*  /  [  H(T,0)uk(t-T)eM~^TdZ(9)dT. 

JatJaw 

(10) 

Hk  is  thus  a  direct,  but  unobservable,  projection  of 
H(t,u)  onto  AT  x  AIV. 

These  expansion  coefficients  are  then  estimated  us¬ 
ing  the  STFT  of  s(t )  computed  using  uk{t): 

Sk(t, u)  =  J  j  H{T,9)uk{t-r)e-^-^TdZ{9)dT, 

(11) 

i.e.,  the  kth  eigenspectrum  Sk(t,w)  is  a  projection  of 
H(t,u)  onto  the  kth  left  eigenvector  uk(t),  estimating 
Hk(t,u>)  over  AT  x  AIV.  When  s(t)  is  a  stationary 
white  noise  process,  it  follows  that 

E[\Sk(t,w)\2]  =  \H(t,w)\2  =  P(t,u).  (12) 

Thus,  the  individual  eigenspectra  are  direct  estimates 
of  P(t,  u>),  and  are  unbiased  when  the  spectrum  is  white. 

Next,  H(t,w)  is  estimated  over  AT  x  AIV  using  the 
right  eigenvectors  Vk(t,u>)  weighted  by  the  projections 
of  H(t,u> )  onto  uk(t),  i.e.,  the  kth  spectrogram: 

K 

u>)  =  ^2  Vk{t-  t,u>  -  w)Sk(t,w),  (13) 

*i=i 
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where  K  ss  AT  AW.  Choosing  AT  AW  too  small  will 
result  in  estimates  with  poor  bias  and  variance  proper¬ 
ties.  The  magnitude-square  of  H(t,Q\t,u)  is  an  esti¬ 
mate  of  P(t,u )  over  AT  x  A W.  This  estimate  is  a  x2 
random  variable  with  two  degrees  of  freedom  (except 
for  DC  and  Nyquist)  with  variance  P2(t,cj).  The  vari¬ 
ance  of  this  estimate  can  be  reduced  by  averaging  over 
AT  x  AW  and  invoking  the  orthogonality  of  T4(t,w): 


P{t,w) 


1 

ATAW 

1 

ATAW 


f  [ 

Jatjaw 

K 

£Afc|S*(f,w)|2. 


dtdw 

(14) 


*=i 


The  average  of  K  direct  estimates  is  a  x2  random  vari¬ 
able  with  2 K  degrees  of  freedom;  hence,  the  variance 
of  this  estimate  is  P2(t,u:)/K.  If  AT  is  chosen  to  be  a 
fixed  proportion  of  the  window  length  T,  then  this  es¬ 
timator  is  consistent  for  fixed  AIT.  Note  that  the  form 
of  this  estimator  differs  slightly  from  that  presented 
previously  [6,  7,  8]  in  the  weighting  by  the  eigenvalues. 


3.  LOCALLY  STATIONARY  PROCESSES 

The  estimate  for  P{t,u)  given  in  equation  (14)  is  un¬ 
biased  for  white  noise.  For  the  estimate  to  be  unbiased 
for  signals  other  than  white  noise,  it  is  only  necessary 
that  P(t,uj)  be  locally  white  in  TF,  since  the  estimate 
is  regularized  to  AT  x  AIT.  A  similar  requirement  is 
seen  in  the  stationary  case  [1],  wherein  the  spectrum 
is  assumed  to  be  smoothly  varying  so  that  it  is  ap¬ 
proximately  white  over  AIT.  A  class  of  stochastic  pro¬ 
cesses  known  as  locally  stationary  processes  [12]  satisfy 
the  requirement  of  being  smoothly  varying  in  TF,  and 
can  be  used  to  describe  a  wide  variety  of  nonstation¬ 
ary  signals.  Locally  stationary  processes  are  stochastic 
processes  with  covariance  functions  of  the  form 

R(tuh)  =  E[s(t1)s*(t2)]  =  g(t-^2.)f(t1-t2),.  (15) 

where  <?(•)  is  a  nonnegative  function  and  /(■)  is  a  valid 
covariance  function;  that  is,  f(t )  possesses  a  nonneg¬ 
ative  Fourier  transform  F(oj).  Through  a  change  of 
variables,  the  symmetric  form  of  the  covariance  func¬ 
tion  is  seen  to  be: 

Rs(t,r)  =  E[s{t  +  r/2)s*{t  -  r/2)]  =  g{t)f(r),  (16) 
The  TF  spectrum  is  thus  given  by  [11]: 

Ps(t,uj)  =  g(t)F(u).  (17) 

For  locally  stationary  s(t),  Ps(t, ui)  will  be  approxi¬ 
mately  constant  over  AT  x  AIT,  and  equation  (12) 
will  still  hold. 


The  class  of  processes  with  such  nonnegative  TF 
spectra  is  easily  extended  to  include  a  wider  range  of 
nonstationary  processes  [13].  Let  s(t)  be  a  locally  sta¬ 
tionary  process  with  covariance  function  R,(t,T )  and 
corresponding  TF  spectrum  Ps(t,w).  Then  the  linearly 
frequency  modulated  signal  s(t)eJ/3t  12  will  have  co- 
variance  Rs{t,T)e^tT  and  corresponding  nonnegative 
TF  spectrum  Ps(t,u>  -  (it).  More  generally,  let  x(t)  — 
s{t)e^^\  where  s(t)  is  locally  stationary  with  sym¬ 
metric  covariance  function  R8(t,r)  from  equation  (16). 
Then  the  covariance  of  x(t )  is 

Rx(t,r)  =  g{t)f(T)ei^t+Tl2)-^t-T/2)).  (18) 

By  making  use  of  the  principle  of  stationary  phase  [14], 
it  can  be  shown  [13]  that  the  TF  spectrum  of  x(t)  is 
given  by: 

Px{t,Lj)  =  g{t)F(u-<t>'{t))  =  Ps{t,bJ-<t>'{t)).  (19) 

Thus,  a  frequency  modulated  locally  stationary  (FMLS) 
process  will  have  a  TF  spectrum  equal  to  that  of  the 
locally  stationary  process  centered  around  the  instan¬ 
taneous  frequency  of  the  FM.  The  generalization  can 
be  taken  one  step  further  to  define  a  composite  FMLS 
process,  consisting  of  a  sum  of  statistically  independent 
FMLS  processes.  The  composite  signal  will  also  have 
a  nonnegative  TF  spectrum  equal  to  the  sum  of  the 
spectra  of  the  individual  processes. 

However,  when  s(t)  is  an  FMLS  process,  P(t,u>) 
will  most  certainly  not  be  constant  over  AT  x  AIT, 
and  equation  (12)  will  fail  to  be  valid.  In  this  case,  the 
smoothing  region  AT  x  AIT  must  be  oriented  to  match 
the  FM  of  the  signal.  This  reorientation  is  equivalent 
to  matching  the  spectrogram  window  to  the  FM  of  the 
signal.  This  matching  can  be  accomplished  by  using 
a  frequency  modulated  window  in  the  original  STFT 
computation.  However,  in  signals  with  multiple  FM 
rates,  as  in  a  composite  FMLS  signal,  this  adaptation 
must  be  performed  locally  in  TF,  as  discussed  next. 

4.  LOCALLY  MATCHED  MULTITAPER 
SPECTROGRAMS 

To  locally  demodulate  the  spectrograms,  it  is  first  nec¬ 
essary  to  construct  a  reliable  estimate  of  the  local  FM, 
which  is  denoted  by  (i(t,u).  Letting  the  TF  depen¬ 
dence  be  implicit,  (3  can  be  estimated  by  computing  a 
local  covariance  of  the  multitaper  spectrogram  normal¬ 
ized  by  the  time  spread:  ((t-t)(u)-u>)) / ((t-t)2) ,  where 
t  and  Q  are  the  local  average  time  and  frequency,  re¬ 
spectively;  their  dependence  on  t  and  ui  is  implied.  The 
covariance  is  computed  by  integrating  over  a  finite  re¬ 
gion  of  the  TF  plane  AT  x  AIT  as  a  two-dimensional 
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sliding  window  to  provide  an  estimate  of  /3  as  a  function 
of  t  and  u: 

JAT  fAW(t  -t-t)(cj-UJ-  Q)P({,  u>)dtdu> 
fATfAW(t-i-i)*P(i,u)dtte 

(20) 

t  and  u)  are  computed  similarly.  Integrating  over  a 
larger  region  will  provide  better  variance  properties  at 
the  expense  of  possible  bias  due  to  multiple  signal  com¬ 
ponents  with  differing  FM  rates  lying  within  the  area 
of  integration. 

Once  (3(t,uS)  has  been  estimated,  each  STFT  Sk  (t,  u>) 
is  dechirped  by  locally  convolving  it  with  the  Fourier 
transform  of  : 

s£(t,u)  =  J  Skit,  u-  ey-^w^de.  (21) 

This  convolution  is  shift-variant;  at  each  frequency,  a 
new  /3  must  be  used.  This  convolution  is  equivalent 
to  matching  the  STFT  to  the  local  chirp  rate.  While 
this  convoluation  at  first  would  appear  to  be  an  0(N2) 
operation,  it  can  actually  be  implemented  much  more 
efficiently.  The  equivalent  chirp  in  the  time  domain 
is  of  length  T,  the  length  of  the  STFT  window.  The 
Fourier  transform  of  this  finite-length  chirp  will  then 
have  bandwidth  (3T.  Thus,  if  the  average  bandwidth 
of  the  various  FM  components  is  M  =  (3T  bins,  an 
STFT  with  N  frequency  samples  can  be  dechirped  with 
only  NM  multiplies  per  time  slice,  comparable  to  the 
computational  complexity  of  the  STFT  itself.  Once  all 
of  the  Sk  (t,  w)  are  dechirped,  the  multitaper  estimate 
is  constructed  as  usual. 

5.  F-TEST  FOR 

FREQUENCY-MODULATED  TONES 

The  validity  of  the  multitaper  estimate  rests  on  the 
assumption  that  the  TF  spectrum  is  smoothly  vary¬ 
ing  over  AT  x  AW.  This  assumption  is  violated  when 
spectral  lines  (FM  or  otherwise)  are  present  in  the  sig¬ 
nal.  In  this  case,  it  is  necessary  to  estimate  the  tones 
and  remove  them  from  the  signal.  Ordinarily,  estimat¬ 
ing  a  tone  with  unknown  FM  would  be  extremely  dif¬ 
ficult.  This  task  is  made  easier,  however,  by  the  local 
matching  described  above.  Once  the  individual  STFT’s 
Sk(t,oj)  have  been  adapted  to  local  FM,  any  frequency 
modulated  tones  in  the  signal  will  behave  exactly  as  a 
stationary  tone  would  behave  in  a  non-adapted  STFT. 
As  a  result,  an  F-test  for  the  existence  of  any  FM  tones 
in  the  TF  spectrum  can  be  defined  by  directly  extend¬ 
ing  Thomson’s  approach  in  the  stationary  case.  The 
expected  value  of  the  kth  dechirped  STFT  for  an  FM 
tone  fieJW)  with  instantaneous  frequency  w  =  <f>'{t)  is: 

E[Sk{t,u)\=nUk(0).  (22) 


The  mean  can  then  be  estimated  via  regression: 


Ef=it4(0)Sfc(t,ca) 

zLium 


(23) 


The  variance  of  this  estimate  is  equal  to  the  background 
TF  spectrum  minus  the  spectral  line,  which  is: 


P(t,u>)  =  £  | Sk(t,u)  -  0)|2  .  (24) 


The  F-test  at  time  t  is  then  given  by  the  ratio  of  the 
power  of  the  spectral  line  and  that  of  the  background 
spectrum: 


(25) 

Under  the  null  hypothesis,  the  test  quantity  at  a  single 
time  is  the  ratio  of  two  y2  random  variables  with  2  and 
2(K  -  1)  degrees  of  freedom.  For  a  signal  of  length  T 
and  an  STFT  of  order  N,  there  will  be  T/N  indepen¬ 
dent  blocks  of  data.  Thus,  the  final  F-test  will  be  a  ra¬ 
tio  of  y2  random  variables  with  2 T /N  and  2(K—l)T/N 
degrees  of  freedom,  integrated  along  the  contour  spec¬ 
ified  by  u  =  cf>'(t ): 

F(,l(t))  =  (g-i)EL  w,<m)\*zLiVi(  Q) 

Ef=i  Eti  i  sk(t,m)  - 

(26) 

If  the  F-test  achieves  the  specified  confidence  level, 
the  tone  should  be  removed  by  subtracting  from  the 
STFT’s  prior  to  forming  the  TF  spectrum,  then  added 
into  the  representation  as  an  impulse: 


Pit,  w)  =  £(*,  w)<*(w  -  <t>' (<))  +  jy 

k=i 

\Sk(t,u)-Mt,u)Uk(<J-cl>’m\  (27) 

Matching  the  STFTs  to  the  local  FM  greatly  simpli¬ 
fies  the  F-test.  With  no  matching,  the  STFT  of  an  FM 
tone  will  be  spread  according  to  the  sweep  rate,  and 
will  thus  have  a  functional  form  dependent  on  (3.  After 
matching,  the  FM  tone  will  have  the  same  response  as 
a  stationary  tone  in  an  unmatched  STFT.  Thus,  the 
expression  for  p  in  equation  (23)  can  be  used  for  all 
FM  rates.  The  procedure  for  testing  for  an  FM  tone 
is  then  a  four-step  process:  compute  the  test  statistic 
F(t,u>)  over  time  and  frequency;  find  candidate  con¬ 
tours  u>(t)  =  <j)'(t)  in  F(t,u);  compute  F(<j>'(t))\  and 
test  its  significance. 


Ill 
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ABSTRACT 

We  consider  the  problem  of  constructing  an  optimal 
reduced-rank  subspace  for  parameter  estimation,  in  mod¬ 
els  where  the  data  is  a  non-linear  function  of  the  param¬ 
eters.  The  solution  which  minimizes  mean-squared  er¬ 
ror  is  a  compromise  between  the  prior  distribution,  and 
the  measurement  model,  reducing  to  the  Karhunen- 
Loeve  Transform  when  only  the  prior  is  considered. 
The  measurement  model  determines  which  parameters 
the  measured  data  is  less  sensitive  to,  and  which  are 
therefore  less  estimatable.  Our  approach  obtains  pa- 
rameterizations  in  which  the  influence  of  these  param¬ 
eters  is  reduced,  so  that  limited  resources  may  be  allo¬ 
cated  to  more  estimatable  features.  We  apply  it  to  the 
problem  of  estimating  index-of-refraction  profiles  from 
sea-surface  clutter  data. 

1.  INTRODUCTION 

In  this  paper  we  will  consider  the  problem  of  con¬ 
structing  a  reduced-dimension  subspace  in  which  to 
search  for  parameter  estimates  £.  Non-linear  models 
of  the  form  y  =  L(8)  +  n  are  considered,  where  the 
measured  data  y  depends  on  the  the  parameter  set  £ 
through  the  non-linear  model  L{-),  and  is  corrupted  by 
additive  noise  n.  We  will  discuss  this  problem  in  the 
specific  case  of  estimating  the  tropospheric  index  of  re¬ 
fraction  profile  from  clutter  returns  received  from  ship- 
based  microwave  radars  [1].  In  the  “refractivity  from 
clutter”  (RFC)  problem,  the  data  y  consists  of  clut¬ 
ter  returns  across  range,  and  the  description  of  prop¬ 
agation  through  the  refractivity  profile  yields  the  non¬ 
linear  model. 

We  ask  the  following  question:  what  is  the  opti¬ 
mal  reduced-rank  basis  for  searching  for  estimates  of 

This  work  was  supported  by  SPAWAR  Systems  Center,  San 
Diego,  under  contract  No.  N66001-97-D-5028.  Presented  at  the 
10th  IEEE  Workshop  on  Statistical  Signal  and  Array  Processing, 
Pocono  Manor,  Pennsylvania,  August  14-16,  2000. 


the  parameter  set?  From  an  engineering  standpoint, 
estimating  the  full  refractivity  profile  would  require  a 
search  through  a  large-dimensional  parameter  space, 
and  would  be  too  computationally  slow  for  real-time 
estimation  of  a  dynamically  varying  profile.  From  a 
modeling  standpoint,  we  are  interesting  in  what  re¬ 
duced  parameterizations  one  should  be  estimating. 

The  Karhunen-Loeve  Transform  (KLT)  describes 
the  optimal  reduced-rank  linear  subspace  for  minimiz¬ 
ing  compression  or  representation  error  [2],  by  consider¬ 
ing  the  prior  statistical  distribution  of  the  parameters. 
The  subspace  is  constructed  from  the  dominant  eigen¬ 
vectors  of  the  prior  covariance  matrix  of  the  parameter 
vector,  R00  =  E  j££*j  (with  the  mean  of  £  subtracted 
out).  The  limitation  of  the  KLT  is  that  it  does  not  in¬ 
corporate  the  estimation  problem:  what  parameteriza¬ 
tions  can  be  estimated  from  the  data  with  the  smallest 
estimation  error?  If  one  were  to  consider  estimation  er¬ 
ror  alone,  then  one  would  build  the  reduced-rank  search 
space  from  the  model  L(-),  ignoring  the  prior.  But  the 
resulting  parameter  basis  functions  might  not  repre¬ 
sent  well  the  natural  distribution  of  the  parameters.  In 
the  RFC  example,  profiles  that  are  built  from  such  a 
basis  will  not  necessarily  look  like  natural,  typically  ob¬ 
served  index-of-refraction  profiles  (for  an  example,  see 
Figure  1). 

The  optimal  basis,  in  a  MSE  sense,  is  a  compromise 
between  the  two  considerations  of  estimation  and  repre¬ 
sentation  error.  What  is  this  basis?  In  the  case  of  linear 
models  ( y=E0+n ),  the  problem  has  been  investigated 
and  solved  in  two  contexts.  Examining  Wiener  filters, 
in  the  form  R^R"),  Scharf  found  that  the  optimal 
(minimum  mean-square  error)  reduced-rank  Wiener  fil¬ 
ter  is  given  by  truncating  the  singular-value  decomposi¬ 
tion  (SVD)  of  RflyRyy2 ,  to  give  trunc  j^R^R,,/ )Ry;/  J 
(see  [3],  p.330,  and  [4]).  More  recently,  Hua,  et.  a!., 
suggested  the  generalized  KLT  (GKLT),  constructed 
from  the  dominant  eigenvectors  of  RgyRjj^Ryg  (see  [5, 
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Index  of  Refractivity  Vertical  Profile 


Figure  1:  A  typical  tropospheric  index  of  refraction 
profile,  with  a  tri-linear  shape  characterized  in  part  by 
base  height,  duct  height,  and  M-deficit. 

6]). 

We  are  considering  the  same  problem,  but  in  the 
context  of  non-linear  models.  Furthermore,  in  the  RFC 
case  we  will  discuss,  a  closed-form  analytical  model  for 
L(9)  is  not  available;  the  clutter  return  y  that  would 
result  from  a  given  profile  9  must  be  computed  numer¬ 
ically.  How  then  do  we  find  the  optimal  reduced-rank 
parameter  basis? 

2.  ORTHOGONALITY  CONDITIONS 

Generally,  one  seeks  a  solution  9  that  maximizes 
some  objective  function  C: 

max  C(y,L(9))  ->  9.  (1) 

e 

For  example,  for  a  MAP  (maximum  a  posteriori )  esti¬ 
mator,  C  maximizes  the  posterior  probability  density 
function  for  y.  In  this  work  we  are  seeking  to  iden¬ 
tify  linear,  reduced-rank  parameterizations  in  the  form 
9r  =  U rb.  Here  Ur  is  is  a  “tall”  matrix  with  orthonor¬ 
mal  columns,  i.e.  U(,Ur  =  I.  The  problem  is  then 
reduced  to  searching  over  candidate  values  of  b: 

max  C(y,L(\Jrb))  — >  9r  =  U rb,  (2) 

6 

;mh1  the  basic  question  is,  how  do  we  choose  Ur  ? 


Some  useful  results  can  be  obtained  by  assuming  the 
following  two  axioms:  both  the  full-rank  and  reduced- 
rank  estimators  are  uncorrelated  with  (orthogonal  to) 
the  error  of  the  full-rank  estimator: 


(A)  E 

(o  -  Mr 

—  0;  and  (B)  E 

\e-i)£ 

=  0. 

(3) 

The  first  condition  is  strictly  true  for  the  conditional 
mean  (CM)  estimator,  which  is  also  the  Minimum  Mean- 
Squared  Error  (MMSE)  estimator,  and  for  the  Linear 
MMSE  estimator  (Wiener  filter).  It  can  be  shown  that 
the  second  condition  is  strictly  true  if  9  is  constructed 
from  the  the  MMSE  estimator  or  LMMSE  estimator, 
and  §r  is  constructed  from  the  same  type  of  estimator 
of  =  U££.  This  condition  basically  excludes  9r  from 
bringing  in  side  information  about  9  that  is  not  present 
in  9.  (In  simple  terms,  we  don’t  have  the  situation 
where  §  is  poor  estimator,  while  9r  is  simultaneously 
based  on  a  good  estimator.) 

A  consequence  of  this  condition  is  that  the  error 
correlation  of  9r  is  greater  than  that  of  9: 

Qr  =  Q  +  ^[(l-i)(l-i)t]  >Q,  (4) 

where  Q  =  E  |^(0  -  9) (9  -  0)tj .  If  we  seek  the  reduced- 
rank  estimator  that  minimizes  the  residual  MSE  (trace 
ofE[(9-9r)(9-9r)%  it  can  be  shown  that  the  error 
correlation  can  be  rewritten  as 

QP  =  Q  +  (I-PP)RM(I-Pr),  (5) 

where  R^  =  E  is  the  estimator  correlation,  and 

Pr  =  UrU).  is  the  projection  onto  the  reduced-rank 
subspace.  Using  the  same  argument  as  that  taken  for 
the  KLT,  the  reduced-rank  subspace  is  then  constructed 
from  the  dominant  eigenvectors  of  R^ .  It  should  be 
noted  that  in  the  case  of  the  linear  model,  this  result  re¬ 
duces  to  the  “Generalized  KLT”  discussed  in  [3,  4,  5,  6]; 
i.e.  R^  becomes  R^R^R^,,. 

This  solution  is  intuitively  pleasing:  to  find  a  reduced- 
rank  subspace  to  search  for  parameter  estimates,  search 
the  subspace  where  the  full-rank  estimates  naturally 
tend  to  lie.  Also,  note  that  a  consequence  of  the  first 
orthogonality  condition  is  that  the  a  priori  covariance 
of  the  parameter  vector  9  —  (9  —  9)  +  9  can  be  de¬ 
composed  into  the  correlation  of  the  error  (0  —  9),  and 
the  correlation  of  the  full-rank  estimator  9.  This  ob¬ 
servation  can  be  written  in  the  form  of  a  “Pythagorean 
Theorem” : 

R  99  =  Q  +  R#0 

->  R^  =  R00  -  Q-  (6) 
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In  this  formulation,  we  should  take  the  dominant  eigen¬ 
vectors  of  the  difference  between  the  a  priori  covariance 
and  the  full-rank  error  correlation,  which  reduces  to  the 
KLT  in  the  limit  that  the  error  correlation  becomes 
small. 

3.  CONSTRUCTING  THE  SUBSPACE  IN 
PRACTICE 

Estimating  the  covariance  matrix  R^  over  the  full 
parameter  space  may  be  computationally  intensive  in 
practice,  limited  by  the  computation  time  of  the  prop¬ 
agation  model  L(-),  over  a  set  of  values  of  6  (either 
grid  points  or  realizations).  Recently,  closed-form  ex¬ 
pressions  were  obtained  for  the  Fisher  information  ma¬ 
trix  in  the  RFC  estimation  problem  [7],  which  could 
in  principle  be  used  to  approximate  the  full-rank  er¬ 
ror  correlation  in  Equation  6,  i.e.  Q<;  J-1.  However, 
this  approach  is  infeasible  if  the  dimension  of  the  ini¬ 
tial  parameter  vector  6  is  too  high,  since  the  multi¬ 
dimensional  numerical  differentiation  for  the  estimate 
of  J  requires  the  evaluation  of  the  nonlinear  function 
L(-)  on  a  number  of  grid  points  that  increases  quadrat- 
ically  with  the  dimension. 

An  alternate  approach  taken  here  is  to  obtain  sam¬ 
ples  for  a  sample  covariance  matrix  estimate  of  R^, 
where  each  sample  is  an  approximate  conditional-mean 
estimate  6,  formulated  as  follows: 

f  f  dO  6f(9,y) 

=  jdoemyj  = 

fd6  6J(6\y)m)  Zi&ifim 

fd&fwv)m  ~  EiHm 

where  6_i  are  samples  drawn  from  the  prior  /(#),  i.e. 
historical  data,  and  w(y ,  Of)  is  a  normalized  weighting 
factor,  proportional  to  the  likelihood.  So  an  estimate  is 
obtained  by  averaging  over  historical  profiles  that  are 
weighted  by  their  likelihood  of  producing  the  data  y. 

4.  APPLICATION:  ESTIMATION  OF 
TROPOSPHERIC  REFRACTIVITY 
PROFILES 

To  evaluate  this  approach  to  rank-reduction,  we 
used  profiles  from  the  VOCAR  data  set,  taken  at  three 
sites  off  the  coast  of  southern  California  in  1993  [1]. 
The  most  straightforward  way  to  apply  the  KLT  ap¬ 
proach  is  to  simply  take  the  profiles  of  M- values  (mod¬ 
ified  refractivity)  over  a  uniform  height  grid,  and  con¬ 
catenate  them  into  the  columns  of  a  data  matrix  0,  and 


Mod.  Refractivity 


Figure  2:  An  “octo-linear”  fit  of  a  profile,  consisting  of 
eight  linear  segments. 

use  the  resulting  sample  covariance  of  R eof  to  gen¬ 
erate  dominant  eigenvectors/EOFs  (extended  orthogo¬ 
nal  functions),  and  in  order  to  generate  new  random 
profiles  for  analysis.  Unfortunately,  a  Gaussian  ran¬ 
dom  model  with  covariance  Reof  fails  to  reproduce 
the  characteristic  tri-linear  shape  of  observed  profiles 
(Figure  1,  the  second  linear  segment  is  responsible  for 
the  downward  refraction  that  causes  ducting  behavior). 
In  particular,  the  height  of  the  duct  (height  of  the  first 
two  segments  in  the  tri-linear  profile)  may  vary  con¬ 
siderably,  and  averaging  over  an  ensemble  of  observed 
profiles  tends  to  suppress  the  key  feature  of  the  duct;  it 
is  “washed-out”  in  the  sample  mean  (not  shown  here). 
In  addition,  profiles  synthesized  from  the  sample  mean 
and  R eof  tend  to  have  many  mini-ducts  over  the  en¬ 
tire  height  range,  features  not  observed  in  real  data. 

To  formulate  a  random  model  that  synthesizes  real¬ 
istic  profiles,  and  at  the  same  time  formulate  an  initial 
profile  parameterization,  we  fit  each  historical  profile 
to  a  profile  consisting  of  eight  linear  segments  (i.e.,  an 
“octo-linear”  profile),  as  shown  in  Figure  2.  This  pro¬ 
cedure  fits  the  profile  to  a  length- 17  parameter  vector 
9.  corresponding  to  the  heights  of  the  eight  segments, 
the  widths  of  the  eight  segments  (or  M-deficits),  and 
the  the  M- value  at  zero  height  (sea  level). 

The  key  characteristic  of  this  fit  is  that  it  is  feature- 
based:  referring  to  Figures  1  and  2,  the  top  of  the  sec¬ 
ond  segment  was  generally  chosen  to  correspond  to  the 
middle  of  the  duct  (the  first  two  segments  accounting 
for  base-height),  and  the  top  of  the  fourth  segment  was 
chosen  to  correspond  to  the  top  of  the  duct  (the  first 
four  segments  accounting  for  duct  height).  (For  the  re¬ 
sults  shown  here,  the  fit  was  obtained  manually.)  To 
reduce  a  spurious  source  of  variance  in  these  parame¬ 
ters,  the  historical  profiles  were  edited  to  remove  those 
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Prior:  eigenvector  |1) 


Prior:  eigenvector  [2] 


Estim:  eigenvector  |1] 


Estim:  eigenvector  [2] 


Figure  3:  The  first  four  dominant  eigenvectors  of  the 
prior  covariance,  Use-  Each  panel  contains  three  plots, 
the  profiles  corresponding  to  (1)  the  mean  (dashed), 
(2)  and  (3)  the  mean  ±  the  eigenvector  (scaled  by  the 
same  constant  in  all  four  panels). 


for  which  the  main  duct  feature  was  not  identifiable 
(such  as  profiles  that  looked  basically  linear,  with  no 
apparent  duct). 

As  might  be  expected,  profiles  synthesized  from  a 
multivariate  Gaussian  model  on  feature-based  parame¬ 
ters,  rather  than  on  the  raw  profiles,  are  more  realistic 
in  terms  of  reproducing  the  gross  shape  of  a  typical 
profile,  including  the  main  duct.  To  insure  positiv¬ 
ity,  a  log-normal  model  was  used  on  the  heights  (the 
appropriateness  of  which  was  verified  by  inspection  of 
histograms  from  real  data).  The  multivariate-normal 
model  was  then  applied  to  the  vector  of  log(heights) 
and  M-deficits. 

Interestingly,  the  resulting  mean,  the  dashed  line 
in  the  panels  of  Figure  3,  looks  very  tri-linear.  The 
influence  of  the  dominant  eigenvectors  of  the  prior  co- 
variance  R00  are  depicted  by  the  solid  lines  in  Fig¬ 
ure  3.  The  first  eigenvector  corresponds  to  increas¬ 
ing  base-height  while  decreasing  M-deficit  in  the  tri- 
linear  model.  The  second  has  a  lot  of  energy  going 
into  shrinking  and  expanding  the  length  of  the  top  seg¬ 
ment.  This  by  itself  is  a  strict  degeneracy:  scaling  of 
the  length  of  the  final  segment  has  no  effect  on  the  pro¬ 
file  and  no  effect  on  the  clutter  measurements  used  to 


Figure  4:  The  first  four  dominant  eigenvectors  of  the 
sample  estimator  covariance,  R^. 


estimate  the  profile;  it  is  only  an  artifact  of  the  initial 
parameterization  scheme.  Furthermore,  for  the  mea¬ 
surement  method  presumed  for  this  study,  measure¬ 
ment  of  sea-surface  clutter  strength  across  range,  vari¬ 
ations  in  the  top-half  of  the  the  profile  constitute  an 
effective  estimation  degeneracy,  since  they  have  little 
effect  on  the  ducting  behavior  and  measured  surface 
clutter,  and  are  therefore  difficult  to  estimate. 

We  used  a  sample  covariance  approach  to  approx¬ 
imate  the  estimator  covariance  R^  of  Equation  5.  A 
likelihood  function  f{y\0)  is  easily  obtainable,  as  a  func¬ 
tion  of  the  propagation  loss  L(9)  from  the  transmitter 
to  the  sea  surface,  across  range  (where  the  dimension 
of  y  and  L  is  the  number  of  range  cells).  The  problem 
is  that  the  PE  (parabolic  equation)  numerical  propaga¬ 
tion  of  the  field  is  time  intensive,  severely  limiting  the 
number  of  parameter  values  6t  at  which  the  propaga¬ 
tion  loss  can  be  evaluated. 

To  generate  samples  for  a  sample  covariance,  we 
computed  approximate  conditional-mean  based  estimates 
9,  based  on  the  weighted  sum  of  Equation  7.  In  prac¬ 
tice,  direct  implementation  of  Equation  7  failed,  since 
the  number  of  samples  0,-  (10,000)  was  too  small  to 
adequately  sample  the  likelihood  function,  forcing  one 
weight  Wi  to  be  unity,  and  the  rest  to  be  zero.  This 
effect  was  ameliorated  by  increasing  the  standard  de¬ 
viation  of  the  likelihood  function  by  a  factor  of  35. 
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Figure  5:  The  mean-square-error  (MSE)  for  the  MAP 
estimator  over  a  grid  (1)  based  on  the  prior  covariance 
(KLT)  and  (2)  based  on  the  estimator  covariance. 


The  weighted  sum  can  be  interpreted  as  summing 
over  different  profiles  that  reproduce  well  the  observed 
data.  This  in  turn  has  the  effect  of  averaging  over,  or 
“washing  out” ,  variations  corresponding  to  the  degen¬ 
eracies  discussed  above,  which  have  less  impact  on  the 
measurement,  and  which  are  therefore  less  estimatable. 

The  sample  covariance  of  the  resulting  estimates 
has  the  eigenvectors  shown  in  Figure  4.  These  eigen¬ 
vectors  are  qualitatively  preferable  those  of  the  prior 
covariance,  in  terms  of  their  physical  interpretations: 
the  first  corresponds  to  increasing  duct  height  while 
decreasing  M-deficit,  and  the  second  to  increasing  base 
height  with  increasing  M-deficit.  Note  that  the  second 
eigenvector  of  the  prior  covariance,  in  Figure  3  (with 
energy  going  into  scaling  the  top  segment),  is  here  most 
closely  approximated  by  the  fourth  eigenvector.  So  the 
energy  going  into  this  degenerate,  non-estimatable  fea¬ 
ture  has  been  reduced. 

To  quantitatively  compare  this  parameterization  with 
that  of  the  KLT,  two  grids  consisting  of  6000  points 
were  constructed  from  the  dominant  three  eigenvectors 
of  the  prior  and  estimator  covariance,  respectively.  The 
number  of  grid  points  was  determined  by  the  relative 
energy  of  the  eigenvectors,  as  reflected  by  the  eigenval¬ 
ues;  25  x  16  x  15  and  40  x  15  x  10  grids  were  chosen  for 
the  prior  and  estimator  covariance  eigenvectors,  respec¬ 
tively.  The  mean-square-error  decreases  when  MAP  es¬ 
timates  are  found  over  the  grid  based  on  the  estimator 
covariance;  see  Figure  5. 


5.  CONCLUSIONS 

In  this  paper  we  have  discussed  the  problem  of  de¬ 
scribing  the  lower-dimensional  parameterization  of  an 
unknown  parameter  set  that  is  optimal  in  the  sense 
of  minimizing  mean-squared  error.  This  description, 
in  terms  of  a  reduced-rank  subspace,  depends  on  both 
the  measurement  model  by  which  the  data  depends  on 
the  parameters  and  on  the  a  priori  distribution  of  the 
parameters.  It  can  be  viewed  as  a  generalization  of  the 
Kahunen-Loeve  Transform,  which  considers  only  the 
prior.  The  initial  parameterization  and  the  nature  of 
the  measurement  model  may  contain  parameters  which 
are  degenerate  in  the  sense  that  they  have  less  impact 
on  the  measured  data.  The  aim  of  the  approach  pre¬ 
sented  in  this  paper  is  to  seek  parameterizations  in 
which  the  strength  of  these  parameters  is  decreased,  so 
that  the  reduced-dimension  parameterization  empha¬ 
sizes  more  estimatable  features.  We  have  evaluated 
this  procedure  for  the  application  of  estimating  index 
of  refraction  profiles  from  clutter  returns,  where  it  pro¬ 
duces  more  physically  meaningful  reduced-rank  basis 
functions,  and  decreases  mean-squared-error  relative  to 
the  KLT  basis. 
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ABSTRACT 

We  consider  the  problem  of  jointly  estimating 
the  number  as  well  as  the  parameters  of  two- 
dimensional  sinusoidal  signals,  observed  in  the 
presence  of  an  additive  white  Gaussian  noise 
field.  Existing  solutions  to  this  problem  are 
based  on  model  order  selection  rules,  derived 
for  the  parallel  one-dimensional  problem.  These 
criteria  are  then  adapted  to  the  two-dimensional 
problem  using  heuristic  arguments.  Employ¬ 
ing  asymptotic  considerations,  we  derive  in  this 
paper  a  maximum  a-posteriori  (MAP)  model 
order  selection  criterion  for  jointly  estimating 
the  parameters  of  the  two-dimensional  sinusoids 
and  their  number. 

1.  INTRODUCTION 

From  the  2-D  Wold-like  decomposition  we  have  that 
any  2-D  regular  and  homogeneous  discrete  random  field 
can  be  represented  as  a  sum  of  two  mutually  orthog¬ 
onal  components:  a  purely-indeterministic  field  and  a 
deterministic  one.  The  purely-indeterministic  compo¬ 
nent  has  a  unique  white  innovations  driven  moving  av¬ 
erage  representation.  The  deterministic  component  is 
further  orthogonally  decomposed  into  a  harmonic  field 
and  a  countable  number  of  mutually  orthogonal  evanes¬ 
cent  fields.  In  this  paper  we  consider  a  special  case  of 
the  foregoing  general  problem.  More  specifically,  we 
consider  the  problem  of  jointly  estimating  the  num¬ 
ber  as  well  as  the  parameters  of  the  sinusoidal  signals 
comprising  the  harmonic  component  of  the  field,  in  the 
presence  of  the  purely-indeterministic  component,  as¬ 
sumed  here  to  be  a  white  noise  field. 

A  solution  to  this  problem  is  an  essential  compo¬ 
nent  in  many  image  processing  and  multimedia  data 
processing  applications.  For  example,  in  indexing  and 
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retrieval  systems  of  multimedia  data  that  employ  the 
textural  information  in  the  imagery  components  of  the 
data,  e.g.,  [7],  the  identification  of  similar  textured  sur¬ 
faces  as  being  such,  is  highly  sensitive  to  errors  in  es¬ 
timating  the  orders  of  the  models  of  the  deterministic 
components  of  the  textures.  More  specifically,  in  this 
approach  the  2-D  Wold  decomposition  based  paramet¬ 
ric  model  of  each  textured  segment  of  the  image  also 
serves  as  the  index  of  this  segment.  Therefore  an  accu¬ 
rate  and  robust  procedure  for  estimating  the  orders  as 
well  as  the  parameters  of  the  models  of  the  determin¬ 
istic  components  of  the  textures  is  an  essential  compo¬ 
nent  in  any  such  indexing  and  retrieval  system.  Simi¬ 
lar  requirements  are  posed  by  parametric  content-based 
image  coding  and  representation  methods. 

The  same  type  of  problem,  i.e.,  joint  estimation  of 
the  model  order  and  parameters  for  a  sum  of  2-D  si¬ 
nusoidal  signals  observed  in  additive  noise,  naturally 
arises  in  processing  2-D  SAR  data.  In  this  problem 
however  the  observed  random  field  is  complex  valued, 
where  for  each  scatterer  one  frequency  parameter  cor¬ 
responds  to  the  range  information,  while  the  second 
frequency  parameter  is  the  Doppler.  The  complex  val¬ 
ued  amplitude  of  each  such  exponential  is  proportional 
to  the  radar  cross  section  of  the  target. 

Many  algorithms  have  been  devised  to  estimate  the 
parameters  of  sinusoids  observed  in  additive  white  Gaus¬ 
sian  noise.  Most  of  the  algorithms  assume  that  the 
number  of  sinusoids  is  a-priori  known.  However  this 
assumption  does  not  always  hold  in  practice.  Hence,  in 
the  past  two  decades  the  model  order  selection  problem 
has  received  considerable  attention.  In  general,  model 
order  selection  rules  are  based  (directly  or  indirectly) 
on  three  popular  criteria:  Akaike  information  criterion 
(AIC),  the  minimum  description  length  (MDL)  and 
the  maximum  a-posteriori  probability  (MAP)  criterion. 
All  these  criteria  have  a  common  form  in  that  they  com¬ 
prise  two  terms:  a  data  term  and  a  penalty  term,  where 
the  data  term  is  the  log-likelihood  function  evaluated 
for  the  assumed  model.  However,  most  of  the  papers 
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dedicated  to  this  problem  discuss  the  model  order  selec¬ 
tion  problem  for  various  models  of  one-dimensional  sig¬ 
nals,  while  the  problem  of  modeling  multidimensional 
fields  has  received  considerably  less  attention.  Djuric, 
[1],  proposed  a  MAP  order  selection  rule  for  1-D  sinu¬ 
soids  observed  in  additive  white  noise.  Kavalieris  and 
Hannan,  [4],  prove  the  strong  consistency  of  a  crite¬ 
rion,  that  indirectly  employs  the  MDL  principle.  In 
this  framework  the  observation  noise  is  modeled  as  an 
autoregression  of  an  unknown  order.  In  the  special 
case  where  the  noise  process  in  [4]  is  assumed  to  be  a 
white  noise  process,  the  resulting  criterion  is  identical 
to  the  MAP  criterion  derived  in  [1].  Stoica  et  al,  [5] 
proposed  the  cross-validation  selection  rule  and  demon¬ 
strated  its  asymptotic  equivalence  to  the  Generalized 
Akaike  Information  Criterion  (GAIC).  In  [6]  this  crite¬ 
rion  is  applied  to  the  2-D  problem  as  well,  where  the 
penalty  term  is  proportional  to  the  total  number  of  un¬ 
known  parameters,  exactly  as  in  the  1-D  case.  In  this 
paper  we  derive  a  MAP  model  order  selection  criterion 
for  jointly  estimating  the  number  and  the  parameters 
of  two-dimensional  sinusoids  observed  in  additive  white 
noise. 

The  paper  is  organized  as  follows.  In  Section  2  we 
define  our  notations,  while  in  Section  3  we  formally 
define  the  MAP  model  order  selection  problem.  The 
MAP  model  order  selection  criterion  is  derived  in  Sec¬ 
tion  4.  Finally,  in  Section  5  we  provide  some  numerical 
examples  and  Monte-Carlo  simulations  to  better  illus¬ 
trate  the  performance  of  the  proposed  criterion. 

2.  NOTATIONS  AND  DEFINITIONS 

The  considered  random  field  is  composed  of  an  har¬ 
monic  field  embedded  in  Gaussian  noise.  Let  {y(n,  m)} 
where  (n,m)  €  U  and  U  =  {(n,m)  |  0  <  n  <  5  —  1, 0  < 
m  <  T  -  1},  be  the  observed  S  xT  real  valued  data 
field.  The  elements  of  y(n,  m)  may  be  represented  as 

y(n,  m)  =  h(n,  m)  +  u(n,  m).  (1) 

The  field  {u(n,m)}  is  the  2-D  zero  mean,  Gaussian 
white  noise  field  with  variance  a2.  The  field  {h(n,m)} 
is  the  harmonic  random  field 

k 

h(n,  m)  —  Ci  cos(nu>i  +  mi/*)  +  Gi  sin (nw,  +  mvi) 
i=  1 

(2) 

where  k  denotes  the  number  of  sinusoidal  components 
in  the  data  model,  and  (w*,  vt)  is  the  spatial  frequency 
of  the  ith  component.  The  CVs  and  Gi  s  are  the  am¬ 
plitudes  of  the  sinusoidal  components  in  the  observed 
realization. 


Let  us  define  the  following  matrix  notations: 

y  =  [  2/(0, 0), . .  .,y(0,T  -  1),»(1,0), . . . 

...,y(S  -l,T  -l)f  (3) 

The  vectors  u  and  h  are  similarly  defined.  Rewriting 
(1)  we  have  y  =  h  +  u.  Let  A  denote  the  covariance 
matrix  of  y.  Thus 

A  =  a2IsTxST  (4) 

where  I stxST  is  an  ST  x  ST  identity  matrix.  Hence, 
|A|  =  cr2ST .  Also  define 

a  =  [C\,G\,  C2,  G2,  ■  ■  ■ ,  Cfc,  Gk}T  ■  (5) 


Let 


Aj  = 


j[  Oaii+Oi/j]  j[0u)i+(T-l)vi] 

e  ,  .  .  .  ,  c  , 


. . . ,  e 


J[(S-l)<*+(T-l)i/d 


(6) 


and  let  us  define  the  following  ST  x  2k  matrix 
D  =  [  Re(Ai),Im(Ai),Re(A2),Im(A2), . . . 

...,Re(Afc),Im(Afc)]  (7) 

Using  the  foregoing  notations  we  have  that 

y  =  Da  +  u.  (8) 

In  the  following  it  is  assumed  that  the  matrix  DrD  is 
full  rank. 


3.  MAP  MODEL  ORDER  SELECTION 
CRITERION 


Let  p(k )  be  the  a-priori  probability  that  there  exist 
k  sinusoidal  components  in  the  observed  field.  It  is 
assumed  that  there  are  Q  competing  models,  where 
Q  >  M  ( M  being  the  actual  number  of  sinusoidal  com¬ 
ponents),  and  that  each  model  is  equiprobable.  That 

iS  1 

p{k)  =  Q,  k  e  Zq  (9) 

where  Zq  =  {0, 1,2,...,  Q}-  The  MAP  estimate  of 
M  is  the  value  of  k  that  maximizes  the  a-posteriori 
probability  p(fc|y),  where  k  £  Zq.  More  specifically, 


Mmap  = 


jp(%)  j 


arg  max 

k€ZQ 

f  p(y\k)p(k) ) 

arg  max  <  - r-r - f 

kezQ  l  P( y  J 


p(  y) 
jp(y|fc)j 

arg  max  <  lnp(y|fc)  > 
k€ZQ  l  J 


arg  max 

IcEZq 


(10) 
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where  p(y|fc)  denotes  the  conditional  probability  of  y 
given  that  there  are  k  sinusoidal  components  in  the 
data. 

Let 

W  =  [aJi,w2,...,u>k,u1,u2,...,uk,]T  .  (11) 

Also  let  1Z+  denote  the  positive  real  line,  let  Ak  = 
TZ2k ,  and  let  flk  =  ([0, 27r))2fc.  Thus,  we  have  that 
a  e  7Z+,  a  6  Ak,  and  W  €  flk-  Using  these  notations 
the  conditional  probability  density  p(y\k)  is  expressed 
by 

p(y|*)  =  f  f  [  p(y|fc,w,<r,a) 

Jnk  Jn+  JAk 

x  p(W,o,  a\k)dadodW  (12) 

where  p(W,cr,  a|fc)  is  the  a-priori  probability  of  W,  o 
and  a  given  there  exist  k  sinusoidal  components  in  the 
observed  data. 

4.  DERIVATION  OF  THE  CRITERION 
4.1.  Priors  Selection 

Inspecting  (10)  and  (12)  we  conclude  that  finding  Mmap 
using  the  observed  data  only,  requires  that  some  as¬ 
sumptions  be  made  regarding  the  prior  distribution  of 
the  model  parameters,  p(W,a, a|fc).  Clearly  our  goal 
is  to  derive  a  model  selection  rule  that  will  be  based 
on  a  non-informative  prior  about  the  parameters.  In 
other  words,  the  selected  prior  should  be  chosen  such 
that  it  represents  the  lack  of  a-priori  knowledge  of  the 
values  of  problem  parameters,  before  the  data  is  ob¬ 
served.  (See,  e.g.,  [2]  for  a  detailed  discussion  of  the 
problem  of  choosing  non-informative  priors). 

Clearly, 

£>(W ,  <7,  a|fc)  =  p(<x,a|W,A:)p(W|fc).  (13) 

Since  the  sinusoidal  frequencies  are  assumed  indepen¬ 
dent  of  each  other  (*.  e. ,  that  they  are  not  harmonically 
related) ,  the  lack  of  a-priori  knowledge  of  the  frequen¬ 
cies  is  modeled  by  assuming  the  frequencies  (wj,t/j)  to 
be  uniformly  distributed  on  fi*.  Thus, 

p(wi*=>  =  ■  m> 

Note  that  since  the  probability  of  w;  being  equal  to  Uj 
for  some  i  ^  j  is  zero  (and  similarly  for  i being  equal 
to  i>j),  we  assume  in  the  following  that  for  all  i  ^  j, 
to i  A  Uj  (and  similarly  i/f  Vj).  Hence  the  following 
derivation  of  the  model  order  selection  criterion  holds 
almost  everywhere  in  the  problem  probability  space, 


i.e.,  except  for  a  set  of  models  of  probability  measure 
zero. 

Given  that  W  and  k  are  known,  D  is  also  known 
and  the  observation  model  (8)  becomes  a  linear  regres¬ 
sion  model  where  the  observations  are  subject  to  a  zero 
mean  white  Gaussian  observation  noise  with  variance 
cr2,  such  that  a,  o  are  unknown.  For  this  problem  it  is 
shown  in  [2]  that  in  the  space  defined  by  a  and  In  a  the 
shape  of  the  likelihood  function  surface  is  “data  trans¬ 
lated”,  i.e.,  it  is  invariant  to  translations  that  result 
from  the  different  values  these  parameters  assume  in 
different  realizations  of  the  observed  data.  Hence  the 
idea  that  little  is  known  a-priori  relative  to  the  infor¬ 
mation  contained  in  the  observed  data  is  expressed  by 
choosing  a  prior  distribution  such  that  p(lnor,  a|W,  k) 
is  locally  uniform,  or  equivalently  that 

p(fj,  a|W,  A:)  a  cr-1  .  (15) 

Substituting  (14)  and  (15)  into  (13)  we  have  that  the 
desired  non-informative  prior  is  given  by 

p(W,(r,a|fc)oc^jir,T_1-  (16) 

4.2.  Evaluation  of  the  a-Posteriori  Distribution 

In  this  subsection  we  derive  an  approximate  expres¬ 
sion  for  the  a-posteriori  probability  distribution  p(y|fc) 
given  in  (12).  Since  the  noise  field  {u(n,m)}  is  Gaus¬ 
sian  we  have  using  (4)  and  (8) 

p(y|fc.W,er,a)  =  p(u|er) 

=  (2tt(t2)-^L  exP  j-^2  (y  “  Da)T(y  -  Da)  J.(17) 

Let  a  =  (DrD)~ 1  D7y  and  let  Px  denote  the  projec¬ 
tion  matrix  defined  by 

Px  =  I-D(DrD)1DT.  (18) 

Using  these  notations  we  have  that 

(y  -  Da)r(y  -  Da)  =  yTP±y  +  (a  -  a)TDTD(a  -  a). 

(1Q) 

Applying  the  prior  (16)  and  evaluating  the  marginal 
distribution  we  have 

P(y>W,cr|A:)  =  [  p(y|fc,W,cr,a)p(W,er,a|fc)da 

JAk 

^  Ia  ^27rcr2)_^:  ^pj-^2^  -Da)T(y~Da)j 
X  (2jr)2fc<r  da 

=  (2jrCT3)-V— — L— exp{  — 2^yTp±y} 
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x  J  exp|-^-(a  -  a)TDTD(a  -  a)|  ds 

=  V™2rV(2^exr{-^yTply} 


Substituting  (23)  into  (22)  and  employing  the  Laplace 
asymptotic  approximation  we  have  that  as  ST  -»  oo 


(^/2^ra)2k 
|DtD|!/2  ' 
(20) 


Next,  we  evaluate  p(y,W\k).  Substituting  (20)  we 
have 

p(y,W|fc)  =  f  p{y,W,a\k)da 
Jn+ 

n-2k-l  ST+2k  (ST  -  Tt>±.  \-st-‘ 

oc  2  2k  x7r  2  ,  ( - - - J  |D  D|  2(y  P  y)  2 


p(y|*0 


Lemma  1 


[  exp{ST 

J  Qi, 


lnp(y,  W\k) . 


p(y,W\k)(2n)k\HMLrHST)-k  (25) 


=  0(S2kT2k)  . 


where  ,(  ■)  is  the  standard  Gamma  function  (see,  e.g., 

[2]  for  the  integration  residt). 

Finally,  to  obtain  an  expression  for  the  conditional 
probability  p(y|fc)  we  have  to  evaluate 

p(y\k)=  f  P(y,W|fc)dW.  (22) 

J  Qk 

Since  a  direct  analytic  solution  to  this  integration  prob¬ 
lem  does  not  exist,  we  derive  an  approximate  solution, 
employing  the  Laplace  integration  method  (see,  e.g., 

[3] ).  Following  [3],  p.  71,  we  first  expand  lnp(y,  W| k) 
into  a  Taylor  series  about  W,  where  W  denotes  the 
ML  estimate  of  W.  Since  W  is  a  maximum  point  of 
the  likelihood  function,  the  first  order  derivatives  of 
^  lnp(y,  W\k)  at  this  point  vanish.  Omitting  from 
the  expansion  terms  of  order  higher  than  two,  we  have 


Proof:  See  [9]. 

Substituting  (21)  and  (26)  into  (25)  we  have 


p(y|fc)  oc  2  k  1tt  S( ,  ^ 

x  (yTPxy) 


ST -2k 
2 


)|£>TD| 


-2krp—2k  \ 


where  D  and  Px  are  the  matrices  D  and  Px,  respec¬ 
tively,  with  W  substituted  by  its  ML  estimate, W.  It 
is  possible  to  further  simplify  (27)  by  observing  that 
|DtD|  =  0(S2kT2k)  (see  [9]).  Furthermore,  employ¬ 
ing  the  asymptotic  properties  of  the  Gamma  function 
(see,  e.g.,  [8],  p.  31)  we  have  that  as  ST  — >  oo, 


'ST -2k' 


ST -2k- 
2 


p(y,W|fc)  =  exp  \ST 


lnp(y,  W\k) 


~  exp  | 


where 
H  ml  — 


nlnp(y,W|fc)  ST 


(W-W)tHMl(W 


-W)J 


Substituting  these  approximations  into  (27),  and  omit¬ 
ting  terms  that  are  independent  of  k,  the  final  form 
of  the  model  order  selection  criterion  can  be  readily 
established: 

Mmap 

=  argmin  <  —  lnp(y|fc)  > 
k€ZQ  l  J 

=  arg min  (  —  ln(y7Pxy)  +  ^ln|DTD| 

fcezc  I  2  Z 

+A;ln  +  2k\nST  +  (k  +  l)ln2l 


d2  In  p(y,W|fc)  .  , 

9Wj  8Wi8W2 

92  In  p(y,W|fc)  d2  lnp(y,W|fc) 
9W29Wi  9Wj 

?2lnp(y,W|fc)  92lnp(y,W|fc) 

d\V2kdW1  0W2k3W2 


82  lnp(y,W|<; 


92  lnp(y,W|*0 
awiflWj* 
92  In  p(y,W|A:) 
9W29W2k 


92  lnp(y,W|fc) 


=  arg  mm 

k£.ZQ 


ST -2k 
2 


ln(yTPxy)  +  4k  In  ST |  (29) 


is  the  Hessian  matrix  of  ^=lnp(y,  W|fc)  evaluated  at 
W  =  W.  As  W  is  a  maximum  point  of  lnp(y,  W|fc), 
H ml  is  positive  definite.  Since  In p(y,  W|fc)  is  assumed 
sufficiently  smooth  at  W,  H ml  is  symmetric. 


5.  NUMERICAL  RESULTS 

To  illustrate  the  performance  of  the  proposed  model 
order  selection  rule  we  present  some  numerical  exam¬ 
ples.  In  the  examples  below,  the  data  field  was  gen¬ 
erated  with  four  equiamplitude  sinusoidal  components, 
and  we  define 


SNR;  =  10  log 


Cf  +  G2 
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The  noise  is  a  white  Gaussian  noise  field  with  variance 
a2  which  is  chosen  to  yield  the  desired  signal  to  noise 
ratio.  In  these  experiments  the  signal  to  noise  ratio  of 
each  component,  SNR,,  varies  in  the  range  of  -15dB  to 
-5dB,  in  steps  of  ldB.  For  each  SNR,  100  Monte-Carlo 
experiments  are  performed.  The  data  field  dimensions 
are  32  x  32.  The  frequencies  of  the  sinusoidal  com¬ 
ponents  are  ( — 27r0.155,  27t0.253),  (-27t0.155,  27t0.296), 
( — 27t0. 112, 27r0.274),  (2tt0.112, 2tt0.201).  Their  ampli¬ 
tudes  are  given  by  C,  =  G,  =  1, *  =  1,...,4.  The 
performance  results  of  the  proposed  MAP  selection  cri¬ 
terion  are  summarized  in  Table  1  for  various  values  of 
SNR,.  For  comparison,  the  performance  results  of  the 
GAIC  criterion,  [6],  are  listed  as  well.  To  further  il¬ 
lustrate  the  performance  of  the  proposed  MAP  model 
order  selection  criterion,  the  probabilities  of  correct 
model  order  selection  for  the  two  criteria  are  depicted 
in  Fig.  1.  The  simulation  results  demonstrate  that 
even  for  modest  dimensions  of  the  observed  field,  and 
relatively  low  SNR’s,  i.e.,  as  low  as  -9dB,  the  error  rates 
of  both  the  MAP  and  the  GAIC  model  order  selection 
criteria  are  very  low.  The  performance  of  the  MAP  rule 
is  shown  to  be  better  than  that  of  the  GAIC  for  lower 
SNR’s.  Furthermore,  the  results  indicate  that  for  the 
lower  SNR  range,  the  probability  of  correct  model  order 
selection  by  the  MAP  criterion  is  not  only  higher,  but 
also  that  the  magnitude  of  the  error  is  much  smaller 
than  in  the  case  of  the  GAIC  model  order  estimate. 


SNRj 

k=l 

k=2 

k=3 

k=4 

-15dB 

MAP 

29 

34 

29 

8 

GAIC 

94 

6 

0 

0 

-14dB 

MAP 

4 

27 

45 

24 

GAIC 

86 

12 

2 

0 

-13dB 

MAP 

3 

13 

46 

38 

GAIC 

57 

33 

8 

2 

-12dB 

MAP 

0 

3 

18 

79 

GAIC 

22 

23 

27 

28 

-lldB 

MAP 

0 

0 

7 

93 

GAIC 

2 

6 

21 

71 

-lOdB 

MAP 

0 

0 

4 

96 

GAIC 

0 

0 

8 

92 

-9dB 

MAP 

0 

0 

0 

100 

GAIC 

0 

0 

0 

100 

Table  1:  Performance  comparison  of  MAP  and  GAIC 
criteria  for  various  values  of  SNR,. 

6.  REFERENCES 

[1]  P.  M.  Djuric,  “A  Model  Selection  Rule  for  Sinu¬ 
soids  in  White  Gaussian  Noise,”  IEEE  Trans.  Sig¬ 
nal  Process.,  vol.  44,  pp.  1744-1751,  1996. 


Figure  1:  Probabilities  of  correct  model 
order  selection.  The  solid  and  the  dashed 
lines  represent  the  MAP  and  the  GAIC 
performance  curves,  respectively. 


[2]  G.  E.  P.  Box  and  G.  C.  Tiao,  Bayesian  Inference 
in  Statistical  Analysis.,  New  York:  Wiley,  1992. 

[3]  N.  G.  De  Bruijn,  Asymptotic  Methods  in  Analysis, 
3rd  edition,  Amsterdam:  North-Holland  Publish¬ 
ing  Co.,  1970. 

[4]  L.  Kavalieris  and  E.  J.  Hannan,  “Determining  the 
Number  of  Terms  in  a  Trigonometric  Regression,” 
J.  Time  Series  Anal.,  vol.  15,  pp.  613-625,  1994. 

[5]  P.  Stoica,  P.  Eykhoff,  P.  Janssen  and  T.  Soder- 
strom,  “Model-Structure  Selection  by  Cross- 
Validation,”  Int.  J.  Control  ,  vol.  43,  pp.  1841- 
1878,  1986. 

[6]  J.  Li  and  P.  Stoica,  “Efficient  Mixed-Spectrum  Es¬ 
timation  with  Application  to  Target  Feature  Ex¬ 
traction,”  IEEE  Trans.  Signal  Process.,  vol.  44, 
pp.  281-295,  1996. 

[7]  R.  Stoica,  J.  Zerubia  and  J.  M.  Francos,  “The 
Two-Dimensional  Wold  Decomposition  for  Seg¬ 
mentation  and  Indexing*  in  Image  Libraries,”  Int. 
Conf.  Acoust.,  Speech,  Signal  Processing,  Seattle, 
1998. 

[8]  E.  D.  Rainville,  Special  Functions,  MacMillan, 
New  York,  1967. 

[9]  M.  Kliger  and  J.  M.  Francos,  “MAP  Model  Order 
Selection  Criterion  for  2-D  Sinusoids  in  Noise,”  in 
preparation. 


122 


OPTIMUM  LINEAR  PERIODICALLY  TIME- VARYING  FILTER 


Dong  Wei 

Center  for  Telecommunications  and  Information  Networking 
Department  of  Electrical  and  Computer  Engineering,  Drexel  University 
Philadelphia,  PA  19104  U.S.A. 

E-mails:  wei@ece.drexel.edu 


ABSTRACT 

We  study  the  optimum  (in  the  minimum  mean-square 
error  sense)  linear  periodically  time-varying  deconvo¬ 
lution  filter  of  finite  size.  We  show  that  the  filter  can 
be  in  the  form  of  lapped  transform  or  multirate  filter- 
bank,  and  it  includes  the  FIR  Wiener  filter  as  a  special 
case.  We  demonstrate  that  the  proposed  filter  always 
possesses  a  gain  over  the  Wiener  filter. 

1.  INTRODUCTION 

Consider  the  discrete-time  model 

x[n]  =  (s  *  h)[n]  +  w[n]  (1) 

JV-1 

=  ^  h[m]s[n  -  m]  +  w[n]  (2) 

m= 0 

where  s[n]  is  the  original  signal,  h[n\  is  a  known  linear 
time-invariant  (LTI)  system  with  N  taps,  u>[n]  is  the 
additive  noise,  x[n]  is  the  observed  data,  and  the  sym¬ 
bol  *  denotes  convolution.  We  assume  that  both  s[n] 
and  w[n]  are  zero-mean,  wide-sense  stationary,  second- 
order  random  processes,  their  second-order  statistics 
are  known,  and  they  are  uncorrelated,  i.e., 

=  (3) 

for  any  n  and  k.  Such  a  model  has  been  widely  used  in 
signal  processing  applications  such  as  filtering,  smooth¬ 
ing,  prediction,  noise  canceling,  and  deconvolution,  just 
to  name  a  few. 

The  goal  is  to  estimate  the  signal  s[n]  from  the 
noisy,  filtered  data  x[n].  An  LTI  finite  impulse  response 
(FIR)  filter  f[n]  can  be  applied  to  x[n].  The  resulting 
estimate  of  s[n]  is  given  by 


(x*f)[n\ 

(4) 

K- 1 

Y.  f[m]x[n  -  m] 

m= 0 

(5) 

where  K  is  the  length  of  f[n].  The  FIR  Wiener  decon¬ 
volution  filter  is  the  optimum  LTI  FIR  system  (denoted 
by  the  vector  fopt)  in  the  minimum  mean-square  error 
(MMSE)  sense: 

/opt  =  argmmF{|s[n]  -s[n]|2}  (6) 

where 

/  =  [/[0]  /[l]  ...  f[K-  l]f  (7) 

with  the  symbol  T  denoting  matrix  transpose. 

We  now  reconsider  the  optimality  of  the  Wiener 
filter  in  (6)  from  a  different  viewpoint.  The  filtering 
operation  in  (4)  can  be  expressed  as 

s[n]  =  F  lti*M  (8) 

where 

s[n]  =  [s[n]  s[n  -  1]  ...  s[n  -  M  +  1  ]]T,  (9) 

a:[n]  =  [x[n]  x[n  —  1]  ...  x[n  -  L  +  1]]T,  (10) 

Flti  is  an  M  x  L  matrix  given  by 

'  fT  0  0  ...  0 

0  fT  0  ...  0 

Flti  = 

.0  0  ...  0  fT 

and  L  =  K  +  M  —  l. 

The  linear  lapped  transform  [1] 

s[n]  =  Fx[n]  (12) 

where  F  is  an  M  x  L  constant  matrix,  is  a  more  general 
version  of  linear  filtering  than  (8).  We  require  that 
M  <  L.  When  M  =  L,  the  linear  lapped  transform 
reduces  to  a  linear  block  transform. 

A  few  interesting  questions  arise.  Does  the  Wiener 
filter  result  in  the  optimum  (in  the  MMSE  sense)  es¬ 
timate  s[n]?  If  it  does  not,  how  can  we  do  better  and 
what  is  the  best  estimate? 

In  this  paper,  we  answer  these  questions. 
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2.  MINIMUM  MEAN  SQUARE  ERROR 
LINEAR  PERIODICALLY  TIME- VARYING 
FILTERING 


2.1.  Some  Basics 
Since 

s[n  -  IM]  =  Fx[n  -  IM]  (13) 

for  any  integer  /,  such  a  linear  lapped  transform  is  in 
general  a  generic  LPTV  filter  with  period  M.  The 
LPTV  filter  can  be  implemented  by  means  of  an  M- • 
channel  multirate  filterbank  [2]. 

When  M  =  1,  the  LPTV  filter  reduces  to  an  LTI 
filter  with  L  taps.  For  M  >  1,  the  LPTV  filter  reduces 
to  an  LTI  filter  if  and  only  if  the  M  x  L  matrix  F  sat¬ 
isfies.  This  implies  that  any  LTI  filter  of  length  up  to 
L-M+ 1  is  a  special  case  of  the  LPTV  filter  character¬ 
ized  by  F.  Therefore,  the  optimum  LPTV  filter  of  size 
M  x  L  always  possesses  a  gain  over  the  Wiener  filter  of 
length  L  —  M  +  1.  Such  a  gain  results  from  the  more 
flexible  processing  of  data  blocks  than  the  LTI  filtering. 
For  filtering,  the  two  filters  require  L  and  L  -  M  +  1 
multiplications  per  data  sample,  respectively.  When  M 
is  small  compared  to  L,  their  computational  complex¬ 
ities  are  comparable. 

2.2.  The  Optimum  Filter 

The  model  in  (1)  can  be  expressed  in  the  vector  form: 

x{n)  =  Hs[n]  +  w[n]  (14) 


where  H  is  an  L  x  (L  +  N  —  1)  matrix  given  by 

'  hT  0  0  ...  0  ■ 

0  hT  0  ...  0 


H  = 

0  0 

...  0  hT  _ 

> 

(15) 

h  = 

[Mo]  Mi] 

...  h[N  -  1]]T, 

(16) 

s[n]  =  [s[n]  s[n  —  1] 

and 

s[n  —  L  —  N  +  2]]t  , 

(17) 

w[n]  =  [?u[n] 

w[n  —  1] 

. . .  w[n  —  L  +  l]]r. 

(18) 

Define  the  estimation  error  in  the  nth  block  as 


We  attempt  to  design  the  optimum  F  to  minimize  the 
mean-square  error  (MSE)  in  the  block  s[n],  which  is 
given  by 

J  =  bEieH^  e(n]}'  (23) 

It  follows  that 

J  =  ± E{\\(FH  -  A)s[n }  +  Fm[n]||2}  (24) 

=  ^E{tv[((FH  -  A)s[n]  +  Fw[n ]) 

x((FH  -  A)s[n]  +  Fw[n])H]}  (25) 

=  ~tr[(FH  -  A)RS(FH  -  Af  +  FRwFh] 

(26) 

=  ~tx[F{HRsHH  +  Rw)Fh  +  ARSAH 
-FHRsAh  -  ARSHHFH ]  (27) 

where 


Setting 


Rs  =  £{s[n]sff[n]}, 
Rw  =  E{w[n]wH[n]}. 


dj 

dF 


F=F„pt 


0  MxL, 


(28) 

(29) 

(30) 


we  obtain  the  matrix  form  of  the  Wiener-Hopf  equa¬ 
tions: 


Fopt{HRsHH  +  Rw)  =  ARsHh.  (31) 

Therefore,  the  optimum  LPTV  filter  is 

Fopt  =  ARsHh{HRsHh  +  R,„)-1  (32) 

and  the  resulting  minimum  MSE  is 

Jlptv  ,min  —  &s  -^rtr  [ARsHh{HRsHh  +  Rlu)~1 
xHRsAh],  (33) 

The  optimum  filter  can  be  viewed  as  the  extension 
of  the  FIR  Wiener  filter  to  LPTV  system.  Indeed,  when 
M  —  1,  Fopt  reduces  to  the  FIR  Wiener  filter  with  L 
taps.  On  the  other  hand,  for  D  =  1,2,  ...,M,  the 
Dth  row  of  the  filtering  matrix  Fopt  is  the  MMSE  FIR 
filter  for  estimating  s[n  -  D  +  1]  from  the  data  set 
{x[m]  :  —oo  <  rn  <  n}. 


e[n ]  =  s[n]  -  As[n]  (19) 

where 

A  =  [Im  Omx(l-m+w-i)],  (20) 

and 

e[n]  —  F(/Ts[n]  +  m[n])  -  As[n]  (21) 
=  (FH-A)s[n)  +  Fw[n\.  (22) 


2.3.  When  Is  There  No  Gain? 

In  general,  the  performance  of  the  optimum  LPTV  fil¬ 
ter  is  better  than  the  performance  of  the  optimum  LTI 
filter  in  the  sense  that 

JhPTV  ,min  <  Jvn  ,min  (34) 

where  the  equality  holds  if  and  only  if 


124 


•  the  signal  s[n]  is  a  white  noise  process,  i.e., 

E{s[n]s*[n  +  l]}=<r2s6[l},  (35) 

•  the  noise  rc[n]  is  a  white  noise  process,  i.e., 

E{w[n\w*[n  +  /]}  =  <r%6[l],  (36) 


and 

•  the  LTI  system  h[n]  has  only  one  tap,  i.e.,  N  =  1. 


2.4.  Asymptotic  Performance  Analysis 

We  assume  that  s  [n]  and  w[n]  are  both  regular  pro¬ 
cesses  with  rational  power  spectra. 

Let  fo  [n]  denote  the  causal,  infinite  impulse  re¬ 
sponse  (HR)  Wiener  deconvolution  filter  for  estimat¬ 
ing  s[n  —  D]  from  the  data  set  {rr[m]  :  — oo  <  m  <  n}, 
where  D  >  0  indicates  a  delay.  The  performance  of 
frj[n]  shall  be  used  in  our  analysis  of  the  asymptotic 
performance  of  the  proposed  optimum  LPTV  filter. 
The  transfer  function  of  fo  M  is  given  by 


FD(z)  = 


<jIQ(z) 


z-dPs(z)H*{1/z*) 


Q*(l/z*) 


(37) 


J  + 


where  Q(z )  is  the  monic,  minimum-phase  factor  deter¬ 
mined  by  the  spectral  factorization  of  the  power  spec¬ 
trum  of  a:[n]: 

Px(z)  =  H(z)H*(l/z*)Ps(z)  +  Pv,(z)  (38) 
=  olQ{z)Q*{l/z*)  (39) 


and  the  subscript  “+”  is  used  to  indicate  the  “positive¬ 
time  part”  of  the  sequence  whose  z-transform  is  con¬ 
tained  within  the  brackets.  The  resulting  MSE  is 


MMSE  FIR  filter  converges  to  the  MMSE  causal  IIR 
filter.  Therefore, 


Jv\K  ,min  >  lim  Jy\k  ,mtn  (42) 

L-»oo 

=  ^IIR.min-  (^3) 

If  M  is  fixed  and  L  tends  to  infinity,  then  the  Dth 
row  of  the  optimum  filtering  matrix  Fopt  converges  to 
/j}[n].  Therefore, 


Jlptv  ,min  ^  lim  JhPTV  ,min  (44) 

L-*oo 
n  M- 1 

=  77  £  ^mimin'  (45) 

D= 0 

If  both  M  and  L  approach  to  infinity  with  K  —  L  - 
M  +  1  fixed,  then 


lim  JLpTV.n 

L— too.M— k  oo 


1  M- 1 


(46) 


which  corresponds  to  the  MSE  of  the  non-causal  IIR 
Wiener  filter. 

In  summary,  the  optimum  LPTV  filter  outperforms 
the  Wiener  filter  asymptotically. 


3.  CONCLUSION 

We  have  presented  the  MMSE  linear  periodically  time- 
varying  deconvolution  filter.  The  proposed  filter  out¬ 
performs  the  it  linear  time-invariant  counterpart  at  the 
expense  of  increase  in  computational  complexity  and 
delay. 


oo  N-l 

^/[Rlmin  =m»)-EE  fD[l]h[n]rs[D  —  n  —  l]  (40) 

/=0  n= 0 

or 

j(°)  —  J_  f  p  (e3u) 

x  [1  -  FD(ejw)H{eju)eju,D ]  dw.  (41) 

The  derivations  of  the  optimum  filter  /p>M  and  its 
associated  MSE  are  given  in  Appendix. 

In  general,  increasing  the  delay  D  leads  to  the  smaller 
min-  Asymptotically,  /oo[n]  is  the  non-causal  IIR 
Wiener  filter. 

The  performance  of  estimating  the  signal  block  As{n\ 
using  the  filtering  matrix  F  can  be  improved  if  more 
observed  data  are  processed,  or  equivalently,  the  pa¬ 
rameter  L  is  increased.  As  L  tends  to  infinity,  the 
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We  now  prove  (37)  and  (40). 

According  to  the  model  given  in  (1),  we  first  whiten 
the  process  x[n ]  to  obtain  a  unit- variance  white  noise 
process: 

y[n]  =  (b  *  x)[n]  (48) 

where  the  whitening  filter  b[n]  is  given  by 


B ^  o-oQ(z) 


(49) 


which  is  causal  and  stable.  Next,  we  obtain  the  esti¬ 
mate  of  s[n-D]  by  filtering  y[n]  with  a  causal  IIR  filter 

gM- 

OO 

-  D]  =  £  9[m]y[n  ~  m\-  (50) 

m= 0 
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the  resulting  MSE  is 


To  minimize  E{|s[n  -  D]  —  s[n  -  D] |2}  with  respect  to 
g[ri\,  we  use  the  orthogonality  principle  to  obtain  the 
Wiener-Hopf  equations 


£{(s[n  -  D]  -  s[n  -  D])y*[n  -  &]}  =  0 
for  0  <  k  <  oo,  or  equivalently, 


rsy[k-D]  -  ^  g[m}ry[k  -  m] 

m= 0 
=  #] 

for  0  <  k  <  oo.  Therefore, 

G(z)  =  [z~DPsy(z)]+. 

Since 

rS2/[fc]  =  E{s[n]y*[n-k]} 

=  i?/s[n]  ( Y  6[/]x[n  —  k  —  l] 


u= o 


which  implies  that 


Psy(z)  =  B*(l/z*)H*(l/z*)Ps(z) 
=  H*(l/z*)Ps(z) 
a0Q*(l/z*) 


(51) 

(52) 

(53) 

(54) 

(55) 

(56) 

(57) 


=  £>*[/]»■„,[*  +  *] 

1=0 

oo 

=  £V[Z]£{s[n  +  fc  +  Z] 

(=0 

x  /i[m].s[n  -  m]  +  ro[n]^  |  (58) 

oo  N— 1 

b*  [l]h*  [ m]rs  [k  +  1  +  m]  (59) 

1=0  m= 0 


(60) 

(61) 


Since  the  causal,  HR  Wiener  deconvolution  filter  for 
estimating  s[n  -  D]  from  the  data  set  {i[m]  :  — oo  < 
m  <  n}  is  given  by 

fD(z)  =  B(z)G(z)  (62) 

1  \z-dH*(1/z*)Ps(z)] 


< ?oQ(z )  [  a0Q*(l/z*) 


(63) 


J  + 


Since 


rsx[fc]  =  h[n\s[m  -  k  -  n]^  | 


+E{s[n\w*[n  —  A;]} 

AT-l 

53  h*[n]rs[k  +  n], 


(64) 

(65) 


n=0 


y(D) 

^IIR,min 


E{{s[n-D]-s[n-D])s*[n-D]}  (66) 

rs[0]  -  E|^/z)[Z]2:[n  -  Z]s*[n  -  I>]  j 

(67) 

OO 

rs[0]-J2fD[lKx[l-D]  (68) 

1=0 

oo  N—l 

r4°]  -  53  53  f^Mn]r*,[l  - D  +  n ]  (69) 

1= 0  n= 0 
oo  N  —  l 

'■M-EI  fD[l]h[n]rs[D  - l-n ].  (70) 

;=o  n=0 
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Abstract 

In  this  paper,  fast  techniques  for  invariant  subspace  sep¬ 
aration  with  applications  to  the  DOA  and  the  harmonic 
retrieval  problems  are  presented.  The  main  feature  of 
these  techniques  is  that  they  are  computationaly  effi¬ 
cient  as  they  can  be  implemented  in  parallel  and  can 
be  transformed  into  matrix  inverse-free  algorithms.  The 
basic  operations  used  are  the  QR  factorization  and  ma¬ 
trix  multiplication.  Specifically,  two  types  of  methods 
are  developed.  The  first  method  uses  Newton-like  itera¬ 
tion  and  is  quadratically  convergent.  The  second  method 
can  be  developed  to  have  convergence  of  any  prescribed 
order.  Using  these  approximations,  the  minimum  norm 
solution  for  the  DOA  and  the  harmonic  retrieval  prob¬ 
lems  for  the  projection  of  least  squares  weight  onto  the 
signal  subspace  of  the  data  is  obtained  simply,  without 
performing  any  SVD.  Some  of  the  developed  methods 
are  also  examined  on  several  test  problems. 

1.  Introduction 

The  estimation  of  projections  onto  selective  set  of  invari¬ 
ant  subspaces  of  data  and  covariance  matrices  is  a  com¬ 
mon  requirement  in  the  development  of  high  resolution 
methods.  This  situation  arises  in  adaptive  processing  of 
sensor  array  data  or  sum  of  sinusoids  where  the  estima¬ 
tion  of  the  number  of  strong  signals  present  in  a  given 
set  of  data  and  the  projections  onto  signal  and  noise 
subspaces  is  essential.  Subspace  based  methods  for  fre¬ 
quency  estimation  rely  on  a  low  rank  system  model  that 
is  obtained  by  organizing  the  observed  data  samples  into 
vectors.  MUSIC  and  ESPRIT  based  estimators  are  then 
obtained  using  this  vector  model. 

Projection  of  the  least-squares  weight  vector  onto 
subspace  of  reduced  dimension  is  an  established  tech¬ 
nique  for  reducing  the  number  of  adaptive  degrees  of 
freedom  used  by  an  adaptive  sensor  array.  The  main 
problem  in  conventional  algorithms  for  subspace  esti¬ 
mation  based  upon  eigenvalue  decomposition  (EVD)  or 
singular  value  decomposition  (SVD)  are,  however,  both 
expensive  to  compute  and  difficult  to  make  recursive  or 
implement  in  parallel.  In  contrast,  algorithms  based  on 
the  QR  factorization  have  established  pipelinable  archi¬ 
tectures. 

Since  many  signal  processing  applications  (e.g.  pro¬ 
jection  beamforming,  MUSIC)  do  not  explicitly  utilize 
the  full  set  of  signal  eigenvalues,  diagonalizing  the  co¬ 


variance  matrix  of  the  data  is  not  necessarly  advanta¬ 
geous  and  is  not  required.  Various  alternatives  were 
proposed  by  several  authors.  Kay  and  Shaw  [1]  sug¬ 
gested  the  use  of  polynomials  and  rational  functions  of 
the  sample  covariance  matrix  for  approximating  the  sig¬ 
nal  subspace.  In  [2],  Tufts  and  Melissinos  used  Lanczos 
and  power-type  methods  to  approximate  the  signal  sub¬ 
space.  Karhunen  and  Joutsenalo  [3]  approximated  the 
signal  subspace  using  the  discrete  Fourier  and  Cosine 
transforms.  Ermolaev  and  Gershman  [4]  used  powers 
of  sample  covariance  matrix  based  on  Krylov  subspaces 
to  approximate  the  noise  subspace  when  the  number  of 
impinging  signals  and  a  threshold  which  separates  the 
signal  and  noise  eigenvalues  are  known  o  priori.  In  this 
work,  we  assume  that  a  rough  estimate  of  a  threshold 
is  known.  For  useful  articles  and  books,  the  reader  is 
referred  to  [5],  [6]-[8]  and  the  references  therein. 

The  proposed  algorithms  could  prove  useful  if  a 
threshold  that  separates  noise  and  signal  eigenvalues  is 
known.  This  threshold  can,  in  some  cases,  be  obtained 
by  tracking  subspaces  where  largest  eigenvalue  of  cur¬ 
rent  noise  subspace  or  smallest  eigenvalues  of  current 
signal  subspace  of  the  power  level  of  the  noise  floor  are 
known.  In  these  cases  the  proposed  algorithm  can  help 
speed  up  the  computation  for  final  estimation  of  sub¬ 
spaces.  Another  application  is  when  the  rank  of  signal 
subspace  is  known. 

2.  Data  Model 

The  N  samples  of  a  scalar  valued  signal  y{n)  are  as¬ 
sumed  to  be  the  sum  of  M  complex-valued  sinusoids  in 
additive  zero  mean  white  Gauassian  noise 

xk(n)  =  akei<Wkn+M,  k  =  1, 2,  •  •  • ,  M, 

M  (i) 

y(n)  =  y^  xk{n) +  v(n),  n  =  1, 2,  •  •  •  ,N, 

k= 1 

Here  ak  >  0  is  the  amplitude  and  the  frequencies 
wi ,  •  •  • ,  wm  are  assumed  to  be  distinct  parameters,  and 
the  phases  <pk  are  assumed  to  be  uniformly  distributed 
on  [0,  27t]  and  are  mutually  independent.  The  noise, 
v(n)  is  assumed  to  be  independent  of  the  phases  and  to 
satisfy 

E{v(n)v*(n  —  k)}  —  ol6(k),  (2) 

where  (.)*  denotes  complex  conjugate  and  <5(.)  is  the 
Kronecker  delta  function.  A  low  rank  matrix  represen¬ 
tation  of  the  problem  is  obtained  by  collecting  L  >  M 
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received  samples  in  a  column  vector 

y(n)  =  [y(n)  y(n+ 1)  •••  y(n  +  L-l)]T,  (3a) 

where  (.)T  denotes  the  matrix  transpose. 

The  notation  x(n)  will  denote  the  vector 

x(n)  =  [xi(n)  xi(n+\)  ■■■  xm{ti  +  L-T)]t  . 

(36) 

Hence  y(n)  can  be  written  as 

y(n)  =  V(w)x(n)  +  v(rc),  n  =  1,  •  ■  ■  ,7V  -  L  +  1,  (4) 

where  the  additive  noise  vector,  v(n),  is  defined  simi¬ 
larly  to  y(n)  in  (3)  and  V(w)  is  an  L  x  M  Vandermonde 
matrix  given  by 

1  1  1 

ejW!  ejw 2  ,  .  ,  g jwM 

ej(L-l)wi  ej(L-l)w2  ...  ej(L-l)wM 

(5) 

The  argument  w  is  omitted  in  the  sequel  when  not 
required.  The  covariance  matrix,  R,  of  the  received  win¬ 
dowed  sequence  is 

Ry  =  £{y(n)y‘(n)}  =  VDV*  +  a2vIL,  (6) 

where  the  covariance  matrix  D  =  diag(a i,---,Qm)  is 
diagonal.  The  matrix  II  is  the  identity  matrix  of  size 
L.  Note  that 

Rx  =  £{x(n)x*(n)}  =  VDV*.  •  (7) 

Similar  formulation  can  also  be  obtained  for  the  direc¬ 
tion  of  arrival  (DOA)  problem  except  in  that  case  the 
matrix  D  is  not  necessarly  diagonal. 

In  this  paper,  it  is  shown  that  if  a  threshold  that 
separates  the  signal  and  noise  eigenvalues  or  if  the  di¬ 
mension  of  the  signal  subspace  is  a  priori  known,  the 
subspace  estimation  can  be  obtained  using  the  QR  fac¬ 
torization  of  a  large  power  of  the  covariance  matrix. 

3.  Invariant  Subspace  Computation 

Let  A  be  a  Hermitian  matrix,  and  let  Po  and  Pi  to  de¬ 
note  the  orthogonal  projections  onto  the  invariant  sub¬ 
spaces  corresponding  to  eigenvalues  inside  and  outside 
the  interval  (— 16|,  |6|),  where  b  is  a  nonzero.  An  elegant 
method  for  computing  those  invariant  subspaces  is  pre¬ 
sented  next.  Consider  the  sequence  of  matrices  defined 
by 

Sk  =  (bkIL-Ak)(bklL  +  Ak)~\  (8) 

fok  _ 

then  the  eigenvalues  of  Sk  given  by  {bk+xl  }f=1  con- 

j 

verge  to  1  or  -1  as  k  — »  oo.  Thus  Sk  is  bounded  for 
all  sufficiently  large  k.  It  can  be  shown  the  sequence 
Sk  converges  to  a  matrix  S  satisfying  ,5 2  =  IL,  and 
SA  =  AS.  Moreover,  S  and  A  have  the  the  same  invari¬ 
ant  subspaces  inside  and  outside  a  circle  of  radius  |6|  and 
centered  at  the  origin.  If  (8)  is  computed  directly  using 
powers  of  the  matrix  A,  over-  and  under-flow  will  occur. 


Since  the  sample  covariance  matrix  is  generally  positive 
semidefinite,  we  will  apply  this  iteration  on  the  shifted 
matrix.  Fast  implementation  of  computing  the  limit  of 
the  sequence  which  also  avoids  the  problem  of 

over-  and  under-flow  will  be  given  next. 

Algorithm  1: 

So  =  Rv  —  blL 

Sk+ 1  = 

{(/l  +  Sk)r  -  ( IL  -  Sk)r}{(IL  +  Sk)r  +  (II  -  Sk)r}-\ 

(9) 

It  can  be  shown  that  Sk  satisfies  the  following  elegant 
error  formula 

(sfc+1 + -  s)  =  {(sfe + s)-\sk  -  s)r 

=  {(So  +  S)-1(S0-S)}rfc. 

(10) 

This  method  can  be  made  to  converge  at  any  desired 
rate  by  choosing  an  appropriate  r.  From  several  nu¬ 
merical  experiments,  it  was  observed  that  for  r  =  2,  a 
suitable  K  —  5,  while  K  =  3  if  r  =  3.  Once  the  desired 
convergence  is  obtained,  the  signal  subspace  projection 
is  computed  as  Ps  =  and  the  noise  subspace 

projection  is  approximated  as  P„  =  2K  '■ 1 . 

The  next  results  provide  quadratically  convergent 
methods  for  subspace  computation.  The  significance 
of  the  next  theorem  is  that  it  computes  the  projection 
matrix  for  the  subspaces  whose  eigenvalues  fall  between 
two  numbers  a  and  b. 

Theorem  1.  Let  Xo  —  Rv  be  a  L  x  L  nonsingular 
matrix  and  let  0  <  a  <  b  be  two  positive  numbers.  Let 
Xk  be  generated  using 

Xk+1  =  (2Xk  -  (a  +  b)IL)~\Xl  -  (a  +  b)Xk  +  abIL), 

(Ha) 

where  II  is  thepxp  identity  matrix.  Then  Xk  converges 
quadratically  to  S  —  aQ\  +  bQ2,  where  Qi  and  C}2  are 
the  projections  onto  the  span  of  all  eigenvectors  of  Rv 
whose  corresponding  eigenvalues  are  in  the  right  and 
left  half  planes  of  the  line  which  perpendicularly  bisects 
the  segment  between  a  and  b.  Moreover,  Q i  =  b/j)L_~5 , 

Q2  ~  "n aj}d  AT  satisfies  the  following  error  formula 

(Xk+1  +S)~1(Xk+1-S)  =  (Xfc+Sr^Xfc-S)2.  (116) 

It  should  be  stated  that  the  above  result  holds  true 
for  any  two  numbers  a  ^  b.  In  this  case  if  a  +  b  =  0 
with  a  7^  0,  then  the  subspace  decomposition  reduces  to 
computing  the  projections  onto  the  subspaces  spanned 
by  the  eigenvectors  with  eigenvalues  having  positive  and 
negative  real  parts,  respectively.  Specifically,  if  a  — 

—b  =  1,  the  matrix  S  reduces  to  the  matrix  sign  function 
of  Xo. 

When  a  threshold  b  which  separates  the  signal  and 
noise  eigenvalues  is  a  priori  known,  then  the  suggested 
approach  will  be  very  effective  in  extracting  the  signal 
and  noise  subspaces.  More  generally,  one  can  derived  a 
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stable  and  quadratically  convergent  algorithm  for  com¬ 
puting  the  invariant  subspace  of  the  matrix  A  in  the 
half-plane  with  boundary  determined  by  the  line  which 
perpendicularly  bisects  the  line  segment  between  2  =  0 
and  2  =  2b. 

Theorem  2.  Let  A  be  a  nonsingular  matrix  of  size  L 
and  let  b  Y  0  be  a  complex  number.  For  k  =  1, 2,  •  •  •, 
compute 

Zk+ 1  =  \zk(Zk  -  blhyxzk,  (12a) 

with  Z\  =  A.  Then  the  sequence  Zk  converges  to  2b Z 
where  Z  is  the  projection  onto  the  subspace  spanned 
by  all  eigenvectors  whose  eigenvalues  are  in  the  right 
half  plane  with  boundary  determined  by  the  line  which 
perpendicularly  bisects  the  line  segment  between  2  =  0 
and  2  =  2b. 

The  quadratic  convergence  of  this  algorithm  can  be 
seen  from  the  error  formula  which  can  be  shown  to  be 

(Zk+1  -  2bZ)Zk+1  =  {( Zk  -  2  bZ)Zkl}\  (12  b) 

Note  that  the  matrix  inverse  in  (2a)  can  be  avoided  by 
utilizing  the  Schultz  iteration  [9]. 

The  main  disadvantage  of  (9)  and  (12)  is  that  they 
require  the  computation  of  matrix  inverse.  In  the  follow¬ 
ing  result  an  implementation  of  (9)  which  avoids  matrix 
inverse  computation  is  given. 

Theorem  3.  Let  b  be  a  threshold  which  separate  the 
signal  and  noise  eigenvalues  of  the  positive  definite  ma¬ 
trix  Ry.  Let  Sk  be  a  sequence  generated  as  follows: 


So  —  Ry  —  blL 

(lL  +  Sk)r  _  Qu  Ql2  Rk 
( II  —  Sky  Q21  Q22  0 


k  =  0, 1, 2,  •  •  • ,  K 


Sk+ 1  =  (Qll  +  Q2l)(Qll  —  Q21)) 
then  Sk  converges  to  Px< |t,|  —  Ta>[6[- 


(13) 


Note  that  the  middle  step  in  Equation  (13)  involves 
QR  decomposition.  This  provides  an  rth  order  conver¬ 
gent  algorithm  for  computing  the  projections  onto  in¬ 
variant  subspaces  to  the  left  and  right  of  the  line  2  =  6. 
Once  S  is  computed  accurately,  then  the  eigen-spaces 
can  be  obtained  from  the  QR  factorization  of  Ih~S ,  i.e., 

=  QR,  then  Q*(RV  -bIL)Q  =  ^  ^  .where 

all  eigenvalues  of  A\  are  inside  the  interval  [ — 16|,  \b\] 
and  those  of  A 2  are  outside  that  interval.  This  process 
can  be  repeated  if  necessary  on  smaller  matrices  A\  and 
Ai-  Initial  tests  of  this  algorithm  have  shown  that  this 
implementation  is  stable  and  convergent  even  when  the 
matrix  A  has  an  eigenvalue  as  small  in  magnitude  as 
lO-13. 

We  should  note  that  in  Iteration  (13),  orthogonal 
projections  are  obtained  using  only  matrix  multiplica¬ 
tion  and  the  QR  factorization.  This  method  can  be 
made  to  converge  at  any  desired  rate  by  choosing  an 
appropriate  r. 


Algorithm  2: 

Using  analogous  derivation,  we  obtained  another 
inverse- free  implementation  of  (13)  for  Hermitian  ma¬ 
trices  which  is  given  as  follows: 


P0  =  Ry  ~  blL 

Pk  _  Qu  Ql2  R) 
{ip-PkY\  -  [Q21  Q22J  0 

Pk+ 1  =  Qu(Qu  —  Q21 ) . 


k  =  0,1,2,  ■■■,  K 


then  Pk  converges  to  an  orthogonal  projection.  Let 
Pfc+i  =  QR  be  a  QR  factorization,  then  Q*AQ  is  block 
diagonal.  This  algorithm  indicates  that  projections  onto 
half-planes  can  be  obtained  using  only  matrix  multipli¬ 
cation  and  QR  factorization. 

4.  Estimation  of  a  Threshold 

The  performance  of  estimators  based  on  the  approxima¬ 
tions  given  in  the  previous  section  is  mainly  dependent 
on  the  accuracy  of  a  threshold  that  separates  the  signal 
and  noise  eigenvalues  or  if  the  dimension  of  the  signal 
subspace  is  a  priori  known. 

Since  Ry  is  Hermitian,  it  has  the  eigendecomposi- 
tion  Ry  =  Y^i= 1  uiui  where  A,  and  are  the  ith 
eigenvalue  and  1th  corresponding  eigenvector.  For  con¬ 
venience,  it  is  assumed  that  the  eigenvalues  are  sorted 
in  decreasing  order  so  that  Ai  >  Aa  ■  •  ■  >  Am  >  Am+i  = 
•  •  •  =  \l  =  0%  with  corresponding  eigenvectors  {u;}f=] . 
The  eigenvectors  {uijfLx  are  usually  called  the  signal 
vectors  and  the  eigenvectors  {wj}feM+i  are  called  the 
noise  vectors.  If  the  average  of  the  signal  eigenvalues 
is  denoted  by  As,  then  one  can  show  that  trace^Rvl  js  a 
good  estimate  of  the  threshold  provided  that  L  is  suffi¬ 
ciently  large.  The  main  requirement  for  this  threshold 
is  Oy  <  tracARQ  <  \M  which  holds  provided 


M . ,  2\  r 

7  (Am  0"u)  ^  As 


Note  that  in  this  inequality  the  only  parameter  that  can 
be  varied  is  L.  Clearly,  if  L  is  much  larger  than  M  so 
that  »  1,  then  the  above  inequality  will  hold. 

Although  this  threshold  is  very  simple  to  compute,  it 
holds  only  for  the  theoretical  covariance  matrix,  i.e.,  all 
noise  eigenvalues  are  the  same.  Another  observation 
is  that  (15)  holds  for  smaller  L  if  the  spread  of  signal 
eigenvalues  is  small  and  thus  the  difference  As  —  Am  is 
small,  or  if  Am  —  o’3  is  large.  Both  of  these  cases  lead 
to  smaller  L  for  (15)  to  hold. 

Note  that  for  M  =  2,  (15)  reduces  to 

— - — (A2  —  <?v)  >  Ai  —  A2. 


Also  in  the  hypothetical  case  in  which  all  signal  eigen¬ 
values  are  equal  the  above  threshold  always  accurate  for 
any  L  >  M. 
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When  \s—Xm  is  large,  one  can  use  a  sharper  estimate 
of  the  threshold  based  on 

.  _Ef=1VA ;_T 

M ~  l  ~  L' 

This  estimate  can  be  computed  from  the  covariance  ma¬ 
trix  but  the  computation  is  very  lengthy  and  compli¬ 
cated  even  when  L  is  low.  For  example,  when  L  =  2 
the  value  of  p  can  be  estimated  from 

T2  =  trace(Ry)  +  det(Rv), 

where  T  =  pL.  For  L  =  3,  T  can  be  estimated  by 
solving  the  following  equation 

{(T2  -  a)2  -  46}2  =  8 VcT, 

where  a,  b,  c  are  determined  from  the  characteristic  poly¬ 
nomial  of  Rv  given  by  A3  —  a\2  +  b\  —  c. 

5.  Simulation  Results 

In  this  section,  frequency  estimators  based  on  subspace 
approximations  are  examined  on  several  data  sets  gen¬ 
erated  by  the  equation 

y(n)  =  d1eji2”f'n+M  +  d2ei{2”hn+M  +  v(n),  (15) 

where  di  =  1.0,  d2  =  1.0,  /i  =  0.5,  f2  =  0.52  and 
n  =  1,2,  •■■,1V  =  25.  The  fa  are  independent  ran¬ 
dom  variables  uniformly  distributed  over  the  interval 
[ — 7r,  vr] .  The  noise  v(k)  is  assumed  to  be  white  and 

uncorrelated  with  the  signal.  Note  that  f2  —  fi  <  jj- 

2 

The  SNR  for  either  sinusoids  is  defined  as  101og10(^$), 
where  x{n)  =  die^2nfin+<tl1^  4.  d2ej( 2wf2n+M  ancj  a 2 , 
cr2  are  the  variances  of  a;(n)  and  v(n),  respectively.  The 
size  of  the  covariance  matrix  is  chosen  to  be  L  =  10 
which  in  the  absence  of  noise  has  effective  rank  two.  We 
performed  experiments  to  compare  the  proposed  meth¬ 
ods  versus  the  truncated  SVD-based  MUSIC.  The  SVD 
routine  on  MATLAB  is  used  for  the  computation  of  the 
signal  subspace  eigenvectors  and  eigenvalues  required 
to  implement  a  SVD-based  method  for  comparison.  We 
varied  SNR  from  10  to  20  in  5dB  steps  and  estimated 
the  frequencies  for  data  length  25.  For  each  experiment 
(with  data  length  and  SNR  fixed),  we  performed  100 
independent  trials  to  estimate  the  frequencies.  We  use 
the  following  performance  criterion  (RMSE) 


yy  ^  '(/*  ftrue)2 
i=l 

to  compare  the  results.  Here  Ne  is  the  number  of  in¬ 
dependent  realizations,  and  /,  is  the  estimate  provided 
from  the  it h  realization.  Several  experiments  were  con¬ 
ducted  to  test  the  performance  of  the  algorithms  pre¬ 
sented  in  Theorem  3,  and  the  SVD-based  MUSIC.  The 
mean  values  of  estimated  frequencies  and  their  RMSE 
of  the  SVD-based  MUSIC  are  given  in  Table  1. 


SNR 

/1 

fi 

RMSEh 

RMSEh 

20  dB 

0.500556 

0.522322 

0.00563 

0.012522 

15  dB 

0.500729 

0.521735 

0.00652 

0.014531 

10  dB 

0.500961 

0.524952 

0.00813 

0.019204 

Table  1:  Mean  and  RMSE  of  frequencies  for  data  of 

two  complex  sinusoids  at  frequencies  0.50  and  0.52 

in  noise  with  SNR=20,  15, 10  dB,  dimension  of  data 

vectors  L=10.  Theorem  3  is  used. 
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ABSTRACT 

In  this  paper  we  propose  a  method  to  estimate  the  frequencies  of 
sinusoids  embedded  in  non-Gaussian  noise.  We  model  the  noise 
using  mixtures  of  Gaussians  and  propose  two  original,  efficient 
algorithms  that  allow  for  the  marginal  MAP  estimation  of  the  si¬ 
nusoid  parameters  to  be  estimated.  Outline  of  the  proof  of  con¬ 
vergence  of  the  algorithms  is  also  given  and  simulation  results  are 
presented. 

1  Introduction 

The  harmonic  retrieval  problem  is  a  fundamental  problem  in  sig¬ 
nal  processing  that  has  numerous  applications  in  radar,  seismology 
and  nuclear  magnetic  resonance.  Many  efforts  have  been  devoted 
to  the  development  of  methods  that  address  this  problem,  ranging 
from  periodogram  related  procedures,  to  subspace  and  parametric 
methods  relying  on  maximum  likelihood  or  Bayesian  estimation. 
The  Bayesian  estimation  of  harmonic  signals  in  white  Gaussian 
noise  has  been  the  subject  of  many  recent  papers,  see  [1],  [2],  [4], 
[5],  among  others.  Here  we  address  the  important  and  more  diffi¬ 
cult  problem  of  estimating  the  frequencies  of  sinusoids  embedded 
in  non-Gaussian  noise,  and  formulate  it  in  a  Bayesian  framework. 
A  commonly  used  tool  to  model  non-Gaussian  distributions  con¬ 
sists  of  using  discrete  or  continuous  mixtures  of  Gaussian  distri¬ 
butions,  and  this  is  the  approach  adopted  here.  The  motivation  for 
this  choice  is  that  by  introducing  a  proper  set  of  (artificial)  missing 
data,  say  £,  one  can  often  design  simple  and  efficient  algorithms 
that  allow  for  the  estimation  of  important  features  of  the  poste¬ 
rior  distribution  related  to  the  problem.  However,  from  a  statistical 
point  of  view  the  introduction  of  missing  data  can  typically  lead 
to  inconsistent  estimators  as  the  number  of  parameters  to  be  es¬ 
timated  typically  grows  with  the  number  of  observations.  Joint 
estimators,  i.e.  estimators  involving  £,  should  thus  be  avoided  and 
marginal  estimation  of  the  parameters  should  be  favoured. 

For  the  case  of  sinusoids  embedded  in  a  noise  modelled  as 
a  mixture  of  Gaussians,  the  analytical  expression  of  the  marginal 
posterior  distribution  of  interest  is  of  the  form 

p(a,w,6|y)  =  J  p(a,u,S,Z\y)d£, 

where  a,  w  are  the  amplitude  and  pulses  of  the  sinusoids  and  S 
are  parameters  of  the  observations  noise.  Unfortunately  it  is  not 
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available  in  closed-form  and  one  has  to  resort  to  numerical  meth¬ 
ods.  Monte  Carlo  methods,  and  in  particular  Monte  Carlo  Markov 
chain  methods  (MCMC)  have  proved  to  be  efficient  tools  for  the 
estimation  of  certain  features  of  complicated  posterior  distribu¬ 
tions,  in  particular  MMSE  (Minimum  Mean  Square  Error)  e.g. 
E  ( (a,  w,  (5)|  y)  in  the  case  treated  here. 

However  this  choice  of  estimator  is  not  adapted  when  the  mar¬ 
ginal  posterior  distribution  is  multimodal  and  the  MMSE  estimate 
located  between  the  modes,  possibly  in  a  region  of  very  low  prob¬ 
ability.  Computing  MAP  (Maximum  a  posteriori)  estimates  of  the 
frequencies  might  be  preferable  in  such  cases,  but  whereas  MCMC 
methods  are  well  adapted  to  the  estimation  of  marginal  posterior 
means,  their  use  to  perform  MMAP  (Marginal  MAP)  estimation 
can  be  questionable.  Indeed  in  this  case  further  approximations 
are  introduced  by  histogram  or  density  estimation  methods  and  re¬ 
quire  careful  tuning  of  extra  parameters. 

The  EM  (Expectation  Maximization)  algorithm  is  designed  to 
converge  towards  a  stationary  point  of  the  marginal  posterior  dis¬ 
tribution.  It  is  however  limited  to  certain  classes  of  models  for 
which  the  expectation  and  maximization  steps  can  be  performed 
conveniently.  This  is  why  stochastic  versions  have  been  proposed, 
such  as  SEM  (Stochastic  EM)  or  MCEM  (Monte  Carlo  EM).  Con¬ 
vergence  results  are  sparse  and  the  algorithms  do  not  always  fully 
exploit  the  structure  of  the  statistical  model.  In  this  paper  we  pro¬ 
pose  several  Monte  Carlo  methods  for  performing  MMAP  of  the 
frequencies  of  sinusoids  embedded  in  non-Gaussian  noise.  The 
first  method  relies  on  the  SAME  (State  Augmentation  for  Marginal 
Estimation)  algorithm  [10].  This  algorithm  is  conceptually  very 
simple  and  straightforward  to  implement  in  most  cases,  requir¬ 
ing  only  small  modifications  to  MCMC  code  written  for  sampling 
from  p  ( a,  w,  S,  £ |  y).  In  order  to  reduce  the  computational  com¬ 
plexity  of  this  algorithm,  we  present  a  stochastic  approximation 
type  extension  of  this  algorithm.  We  then  present  an  original  anal¬ 
ysis  of  the  convergence  of  the  stochastic  approximation  type  algo¬ 
rithm  which  relies  on  a  perturbation  analysis  of  the  original  SAME 
algorithm.  Simulation  results  are  presented  that  demonstrate  the 
interest  of  the  approach. 

This  paper  is  organized  as  follows.  In  Section  2  the  signal 
models  are  given.  In  Section  3,  we  formalize  the  Bayesian  model 
and  specify  the  prior  distributions.  Section  4  is  devoted  to  Bayesian 
computation.  We  propose  non  homogeneous  MCMC  algorithms 
to  perform  Bayesian  inference  for  which  sufficient  condition  for 
global  convergence  can  be  established.  Performance  of  these  algo¬ 
rithms  is  illustrated  by  computer  simulations  on  synthetic  data  in 
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Section  6. 


3.1  Prior  distribution 


2  Problem  statement 

Let  y  =  (t/i,  t/2, . . . ,  yT)T  be  an  observed  vector  of  T  real  data 
samples.  The  elements  of  y  are  the  superimposition  of  k  sinusoids 
corrupted  by  noise  n  =  (m , . . . ,  nr) : 

k 

Vt  =  E**  cos  (oJjt)  +  a3j  sin  ( u>jt )  +  nt, 
j= t 

where  1  <  k  <  [(T  —  1)  /2J,  aCj ,  aSj  and  u)j  are  respectively 
the  amplitudes  and  the  radial  frequency  of  the  jth  sinusoid.  We 
assume  that  w  e  fi  =  6  (0,  n)k  \uj j1  /  ui]2  for  ji  /  j2}. 

In  a  vector-matrix  form,  we  have 

y  =  D  (w)  a  +  n, 


We  set  a  prior  distribution  on  the  unknown  parameter  vector  0  = 
(a,  w,  A,  ri:T,  a,  cr2)  6  ©  where  ©  =  R2k  x  12  x  (0, 1)  x 
{0, 1}T  x  (0,1)  x  R+.  The  following  uninformative  improper 
prior  distribution2  is  selected: 


p  (a,ui,cr2|  ri:T,a)  oc 


dt  Mjrjgjg) 


1/2 


In  (w) . 


This  prior  corresponds  to  Jeffreys’  prior  for  the  linear  model  [3],  It 
penalizes  close  frequencies  as  pointed  out  in  [5].  The  parameters 
q  and  A  are  assumed  distributed  according  to  a  ~  W(0,t)  and 
A  ~  W(0,i)  which  are  vague  prior  distributions. 

3.2  Estimation  objectives 

Given  the  observations  y,  Bayesian  inference  about  0  is  based  on 
the  posterior  distribution  p  ( 6\  y)  obtained  from  Bayes’  theorem, 


where  [a]2i_u  =  oc.,  [a]2il  =  aSj  and  [at];>1  =  w,  fori  = 
1, . . . ,  k.  The  T  x2k  matrix  D  (u>)  is  defined  as  [D  (w)](  2j_,  = 
cos  [wjt]  and  [D  (w)](  =  sin  [tot]  for  t  =  1, . . .  ,T,  and  j  = 

1, . . . ,  k.  The  noise  is  assumed  white,  distributed  according  to  a 
mixture  of  Gaussian  distributions,  i.e.1 

nt  ~  AA r  (0,  a2)  +  (1  -  A)  Af  (0,  aa2)  , 

where  0  <  A  <  1  defines  the  mixture  probability,  cr2  is  a  global 
scale  parameter  and  0  <  a  <  1.  It  is  convenient  to  introduce  the 
so-called  missing  data  ri,T  such  that: 

I  ft  ( 0,<t2I{i}  (rt)  +  a<T2I{0>  ( rt ))  , 

and  Pr  (rt  =  1)  =  A  and  Pr  (rt  =  0)  =  1  —  A.  This  allows  us  to 
write  the  likelihood  of  the  observations 

p  (y|a,u>,  A,ri:r,a,<r2)  =  |27ra2Sr1/2 

X  exp  (- (y  -  D  (w)  a)T  (y  —  D  (w)  a))  , 

where  S  =  diag  (l{1}  (rj)  +  al{0}  (rj))  j  =  1, . . .  ,T.  Note 
that  this  likelihood  is  invariant  by  permutation  of  the  indexes  of 
the  pulses  uij,  if  no  ordering  constraint  is  introduced,  and  that  con¬ 
sequently  MMSE  estimates  can  lead  to  very  poor  results.  The  pa¬ 
rameters  of  the  sinusoids,  of  the  noise  and  the  missing  data  i.e. 
0  =  (a,u»,  A,ri;r  ,  q,(T2)  are  unknown,  and  our  aim  is  to  esti¬ 
mate  these  parameters;  a  and  w  being  in  general  the  parameters 
of  primary  interest.  Note  that  the  strategy  developed  in  this  paper 
can  be  extended  to  the  case  of  continuous  Gaussian  mixtures,  in 
order  to  model  heavy  tailed  distributions,  but  we  do  not  consider 
this  case  here. 

3  Bayesian  Models  and  Estimation  Objectives 

In  this  paper  we  follow  a  Bayesian  approach  where  the  unknown 
parameter  vector  0  is  regarded  as  being  drawn  from  an  appropri¬ 
ate  prior  distribution.  This  prior  distribution  reflects  our  degree  of 
belief  in  the  relevant  values  of  the  parameters.  Note  that  when  no 
prior  knowledge  is  available,  then  uninformative  distributions  can 
be  used  [3],  This  is  the  approach  we  follow  here.  We  first  propose 
a  model  that  sets  up  a  probability  distribution  over  the  space  of 
possible  structures  of  the  signal  and  we  give  the  estimation  aims. 

1This  could  be  extended  to  the  case  of  discrete  mixtures  with  more 
components. 


p(0|y)  °cp(y|0)p(0). 

Our  aim  is  to  estimate  this  joint  distribution  from  which,  by  stan¬ 
dard  probability  marginalization  and  transformation  techniques, 
one  can  “theoretically”  obtain  all  posterior  features  of  interest  in¬ 
cluding  the  marginal  distributions,  posterior  modes  or  conditional 
expectations  such  as  the  MMSE  estimate 

E[0|y]=  f  0P(0\y)d0> 

J® 

among  others.  As  discussed  in  the  introduction  this  problem  can  be 
addressed  using  MCMC  methods  but  the  use  of  these  techniques 
for  the  computation  of  the  MMAP  estimator  (a,  w,er2,  &)MMAP 
defined  as 

argmax  p  (a,u;,<T2,a|  y)  , 

(a,w,cr2,a)eR2fcxnxR  +  X(0,l) 

can  be  questionable.  In  the  next  section  we  describe  an  algorithm 
that  allows  for  computation  to  be  performed  by  adapting  MCMC 
techniques  for  MMAP  estimation. 

4  Bayesian  Marginal  MAP  robust  spectral  estimation 

4.1  The  SAME  algorithm 

One  might  be  interested  in  the  marginal  MAP  estimation  of  the  fre¬ 
quencies,  i.e.  finding  the  maximum  ofp  (a,  w,cr2)a:|  y).  In  order 
to  achieve  this  we  introduce  two  versions  of  the  SAME  algorithm 
[10],  the  second  one  being  a  stochastic  approximation  type  algo¬ 
rithm.  Let  us  consider  the  extended  probabilistic  model, 

p®7  (a,w,<72,a,Ai:7!r1;T,i:7|y) 

«  nj=i  P  ( y I  a,  w,  cr2 ,  a,  Aj ,  n ;T,j )  p  (a,  u>,  a2 ,  a,  Aj ,  r i .T,j )  , 

where  7  is  a  positive  integer,  r^rj  is  a  replica  of  the  missing 
data.  Clearly  this  probabilistic  model  admits  p7  ( a,w,cr2,a|  y) 
as  marginal  distribution,  where p7  (a,  o>,<r2,a|  y)  is  the  distribu¬ 
tion  proportional  top7  (a,  w,cr2,a|y).  Given  a  sequence  (7i)ieN 
such  that  lim  ji  =  +00,  the  idea  of  the  SAME  algorithm  is  to 

i— >4-00 

run  a  non  homogeneous  Markov  chain  that  admits 

p®7i  (  a,  w,  (T2,  a  |  y)  as  invariant  distribution  at  each  iteration  i. 

2 A  prior  distribution  p  (ff)  is  said  to  be  improper  if  f@  p  (9)  d9  = 

+00. 
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The  distribution  p®7i  (a,  w,cr2,a|  y)  concentrates  itself  on  its 
set  of  global  maxima  as  i  — >  +oo  (this  is  the  idea  of  simulated 
annealing)  and  the  algorithm  is  thus  hoped  in  practice  to  converge 
towards  a  global  maximum.  Note  that  when  7i  =  1  for  i  >  1  this 
algorithm  is  a  standard  MCMC  that  asymptotically  produces  sam¬ 
ples  from  p  ( 6\  y).  In  practice  one  can  make  use  of  the  properties 
of  the  model  and  analytically  integrate  out  a,  a2  and  A; ,  leading  to 
an  expression  ofp®7  (w,a,ri:T,i:7|y)  up  to  a  constant.  It  can  be 
shown  that 

p®7  (w)a,r11T,1.iiy)  oc  nu  lDTs7lDl1/2  Is;  r 1/2 

x  |M7r1/,z  [yTP7  {u)y]-^T,2+k)+k 

x  IlJ=i  ni.J •  —  bij)!i 

where  mj  =  Ef=i  I{o>  (rt,j)  and 

m7  = 

m7  =  M7Dt  («)  9?  =  £?=1  S."1. 

p7  (W)  =  vf-71  -  (w)  m7dt  (w) 

In  order  to  sample  from  p®7  (w,a,ri:r,i:7|  y),  we  propose  the 
following  algorithm: 


4.2  TheSA2ME 

In  the  current  version  of  the  SAME  algorithm  7 \  replicas  of  the 
variables  ti-.t  are  sampled  at  each  iteration  i,  which  can  rapidly 
become  cumbersome  as  7,  becomes  large.  Let  to  be  an  itera¬ 
tion  chosen  by  the  user.  Then  we  propose  from  iteration  to  not 
to  resample  the  variables  ri;T,i:7i_1  that  are  “frozen”  once  they 
are  simulated  but  simply  sample  the  new  replicas  ri:T,7;_1-t-iT7i- 
The  computational  gain  of  this  SA2ME  (Stochastic  Approximation 
SAME)  is  obvious  and  the  analogy  with  classical  stochastic  ap¬ 
proximation  algorithms  is  clear,  although  we  here  take  advantage 
of  the  statistical  structure  of  the  problem.  However  the  algorithm 
is  no  longer  a  Markov  chain  as  the  update  of  the  parameters  at  it¬ 
eration  i  depends  on  the  past  of  the  chain  up  to  iteration  i  —  1.  In 
fact  this  new  algorithm  can  be  viewed  as  a  perturbation  of  the  orig¬ 
inal  SAME  algorithm,  and  an  analysis  of  these  perturbations  can 
be  carried  out  to  prove  the  validity  of  the  new  scheme,  as  sketched 
in  the  next  section. 

5  Convergence  analysis 

We  first  point  out  a  convergence  result  for  the  SAME  algorithm 
and  then  focus  on  the  SA2ME  algorithm. 


MCMC  algorithm  for  marginal  spectral  analysis 


1.  Initialization  0(o)  =  jw(0\a(0\r^il!7o  }  and  i  =  1. 


2.  Iteration* 

•  For  j  =  1,  ...,7;  .sample 


rt(‘j  from  p®7i  (n,j|y,«{i' 

-!)  a  r^"1) 

’ Tnt,j  9 

forf  =  1, ...,  T. 

Sample  a(<)  ~jp®7i 

a«-D 

,*2(< 

Sample  wj,)  ~  p®7i  | 

y,w^_1),a(: 

°.r« 

j  =  1, ...,  k  with  an  MCMC 

step. 

Sample a(i),(r2(i)  ~p®7i  ( 

\  <72|y,w(l 

r(i) 

Where  rn  means  “n;T  with  n  removed”  and  similarly  for  W7. 

We  comment  the  different  sampling  steps: 

•  Sampling  rt,j  is  straightforward  as  it  simply  involves  sam¬ 
pling  from  a  discrete  distribution. 

•  Sampling  Uj  can  be  done  using  an  adaptation  of  the  tech¬ 
nique  described  in  [1]. 

•  Sampling  a,  a2  is  standard  as  it  requires  the  simulation  from 
an  inverse-Gamma  distribution  and  a  normal  distribution. 

•  Sampling  a  mainly  amounts  to  sampling  from  a  truncated 
inverse-Gamma  distribution  and  can  be  done  efficiently  by 
using  a  rejection  method  based  on  the  work  of  [8]. 


S.l  SAME  algorithm 

First  we  set  81  =  {a,  <r2,a}  and  82  =  {A,ri;r}  and  name 

their  state  spaces  ©1  and  ©2.  The  SAME  algorithm  defines  a 
Markov  chain  on  8\ ,  and  it  can  be  proved  that  this  Markov  chain 
is  uniformly  ergodic  for  a  constant  sequence  7 <.  i.e.  for  any  prob¬ 
ability  distribution  p, 

.  fim^  \\pK*.  (dOi)  —  p1i  (dO !)||  =  0, 

at  a  geometric  rate  independent  of  the  initial  condition,  where  ||  || 
is  the  total  variation  norm.  Here  Ki  is  the  transition  kernel  of  the 
SAME  algorithm  at  iteration  i  which  can  be  formaly  written  as 

Ki  (8tl)Ai})  oc  f&i  P  (*i°|  ^1,7l))  njLiP  (d8?\  *{°)  • 

This  convergence  result  mainly  relies  on  the  fact  that  the  parame¬ 
ters  81  and  82  lie  in  bounded  sets.  From  this  result  and  following 
arguments  similar  to  that  used  to  prove  the  convergence  of  sim¬ 
ulated  annealing,  it  can  be  shown  that  for  a  logarithmic  series  of 
7i  the  SAME  algorithm  for  MMAP  estimation  converges  in  the 
following  sense 

lim  \\pK1K2  ■  ■  ■  K„  (dOi)  —  p7”  (<Z0i)||  =  0. 

n— >+00 


Furthermore  as  the  sequence  p1'  (dOi)  tends  to  a  mixture  of  delta 
functions  located  at  the  global  maxima  of  p  (81)  we  conclude  that 
the  algorithm  will  asymptotically  provide  us  with  an  estimate  of 
9i,mmap  —  arg  max  p(0 1). 

9l£©l 


This  elegant  algorithm  allows  to  sample  from  the  series  of  dis¬ 
tributions  of  interest  and  convergence  results  can  be  proved  that 
support  the  validity  of  the  approach  (See  Section  5).  However 
we  see  that  as  7;  approaches  infinity  the  computational  burden  of 
the  algorithm  becomes  rapidly  unrealistic.  Thus  we  propose  here 
a  stochastic  approximation  adaptation  of  the  algorithm  presented 
above,  which  is  computationally  much  cheaper. 


5.2  SA2ME 

The  proof  of  convergence  of  the  algorithm  relies  on  an  analysis  of 
the  perturbations  introduced  by  the  new  scheme  upon  the  original 
SAME  algorithm.  We  sketch  here  the  proof  of  the  algorithm,  out¬ 
line  the  main  propositions  that  lead  to  the  convergence  result  and 
explain  their  intuitive  meaning.  We  introduce  some  notation  that 
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will  be  useful  throughout  the  proof.  We  introduce  the  transition 
probability  corresponding  to  the  SA3ME  algorithm 

Ki+i  ^1:7i),  e[i]-,depi+lni+1\ d9\i+1)^j 

oc p  ^d^i+1)| P  (47i+1:7i+l)| , 

Here  we  simply  express  the  fact  that  the  missing  data  6$,1:7i)  are 
“frozen”  once  they  are  simulated.  In  order  to  study  the  conver¬ 
gence  properties  of  the  second  algorithm  it  will  be  useful  to  intro¬ 
duce  for  some  integer  k  the  transition  kernel  of  the  algorithm  for 

which  only  the  missing  data  up  to  iteration  k  —  1,  6\'  ,  are 

frozen,  and  missing  data  from  then  on,  7,+1)j  are  sam_ 

pled  at  each  iteration.  More  precisely,  for  i  >  k  we  define 

In  order  to  study  the  convergence  properties  of  our  algorithm,  we 
will  need  notation  to  combine  these  kernels,  namely, 

pKun  (d9[n\d9iyn-1  +  1:ynA  =  fBnx9ln-i  P  (dC) 

x^i  (e(°);de[1),de(21)^  k2  (e^\e^-,def\d8fn2)^ 

...  x  Kj  (e[i-1\ei1:v-1^,de[j\de^i-'+u',iA  x  ... 

...  xkn  ^n-1),^1:7n-^;d0("),<i^7n-,+1:7n)^ , 

and  for  k,j>m 

pKuk-iKk:n,m  {det\de^)  =  /e?xe,m_.  p  (de[0)) 
xKi  (0[o);d0[1)d$;1))  k2  (i9[1),^1);d^2),d^2:72)j  ... 

/glm-lm-l  ■■■/eji-'m-l 

...  x  kj<m  ^e[j-1\eilnm-1);de{i\d6^m-i+1:ii^ 

...  X  Kn,m  ^-1)41^-1);d^)1d4^-1+1:7”))  . 

Now  that  notation  is  defined  we  can  express  the  main  result  of 
this  section.  We  want  to  study  the  asymptotic  properties  of  the 
difference  of  the  two  stochastic  processes,  more  precisely  we  want 
to  prove  that  under  certain  conditions  for  any  probabilities  v  and  p 

limn-j+oo  pFfl:n||  —  0. 

A  trivial  decomposition  and  the  application  of  the  triangle  inequal¬ 
ity  leads  to 

|'zA'l;n  pkl:n  ||  <  \\vKl,n  -  pK1:n\]  +  ||/i/fi:n  ~/t.Kj:„||  . 

From  the  result  of  the  previous  subsection,  the  SAME  algorithm  is 
ergodic  and  thus  the  first  term  goes  to  zero  as  n  -4  +oo.  Conse¬ 
quently  we  focus  on  the  second  term. 

Our  results  are  based  upon  a  decomposition  into  an  estimation 
error  and  an  approximation  bias,  which  we  now  state: 


Proposition  1  For  all  integers  m„,  and  n  such  that  mn  <  n,  we 
have  the  estimate 

pKl:n  j|  ^  :n  pH  1  :mn  ^mn+l:n,mn  || 

“b  2 ~2k=mn+l  — —  pA”l:k|- 

Proof.  For  mn  <  n  we  have  the  telescoping  sum 

-,n  flK l;n  —  fJ,K i:n  ftKl \mn  +  l:n,mn 

^fc=7nn  +1  —  1  ^k:n,mn  fJ'^lik^k+l  :n,mn  > 

with  the  convention  Kn+\-.n,m„  =  Id.  Then  by  first  applying  the 
triangle  inequality  and  the  fact  that  for  any  probability  measures 

p  and  v  the  following  statement  holds  -  vKk<rrin  jj  < 

[|p  —  v ||  we  obtain  the  result.  I 

Proposition  2  There  exists  a  sequence  mn  such  that 

\\pK1:n  pK\:mn  Hmn+l:n,mn  ||  —0. 

Intuitively,  during  the  mn  first  iterations  K i:m„  introduces  an  ap¬ 
proximation  error  compared  to  the  SAME  algorithm,  which  is  then 
corrected  in  the  following  n— mn+l  iterations  with  Kmn+i:n,mn  ■ 
Then  if  m„  increases  significantly  less  fast  than  n  such  that 
Kmn+i:n,mn  can  correct  and  forget  in  n— mn  +  l  iterations  the  er¬ 
ror  generated  during  the  mn  first  iterations,  then  the  result  should 
hold. 

Proposition  3  There  exists  a  sequence  mn  such  that 

n 

lim  \\pKuk-lkk,mn  -  pkl:k  =  0. 

fc=mn  +1 

This  result  relies  on  the  fact  that  for  term  k  in  the  sum,  the  two  dy¬ 
namics  are  the  same  up  to  time  k  —  1  and  simply  differ  at  iteration 
k  where,  on  one  hand,  the  O*'™"  ,fc^are  “rejuvenated”  with  Kk,mn 
and  on  the  other  hand  only  6 ^  is  sampled  with  Kk  .  When  9i  and 
62  lie  in  bounded  spaces  one  can  bound  the  error  introduced,  and 
show  that  there  exists  0  <  0  <  1  such  that  for  mn  =  n  —  n0  the 
sum  of  these  errors  goes  to  zero  as  n  — >  +00. 

By  combining  the  three  propositions  and  using  the  conver¬ 
gence  result  proved  for  the  SAME  algorithm  we  can  deduce  the 
following  result: 

Theorem  4  There  exist  sequences  mn  and  7„  such  that  for  any 
P- 

lim  lip7"  -  /lift ;n  II  =  0, 

n— b-foo  ||  (I 

which  proves  the  validity  of  the  SA2ME  algorithm  under  suitable 
conditions.  Note  that  these  results  rely  on  a  boundedness  assump¬ 
tion  on  the  parameters.  We  are  currently  extending  these  results  to 
more  general  cases  for  other  problems. 
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6  Simulation  results 

We  applied  the  two  algorithms  described  above  for  the  following 
parameters:  T  —  64  and  k  —  2.  We  define  Ei  =  a2.  +  a^. .  E\  — 
20,  E2  =  6.32,  -  arctan  (asl/aCl)  =  0,  -  arctan  ( aaj  aC2)  = 
ixj 4, a>i/ 2n  =  0.2  and  w2/27r  =  0.3.  The  SNR  is  defined 
as  10  log , o  E\/  (2 (T2)  and  equal  to  ldB.  Theoretically,  the  algo¬ 
rithms  require  a  so-called  logarithmic  cooling  schedule  7;  and  an 
infinite  number  of  iterations  to  converge.  This  sequence  goes  to 
+00  too  slowly  to  be  used  practically.  We  run  here  the  algorithms 
for  500  iterations  and  select  a  linear  growing  cooling  schedule 
7;  =  A  +  Bi  where  70  =  1  and  7500  =  102.  We  used  the 
same  series  7 ,  for  the  second  algorithm  and  set  to  =  20.  Note  the 
slower  convergence  of  the  second  algorithm  compared  with  the 
first  one,  as  expected. 


Figure  1 :  Convergence  of  the  SAME  towards  the  marginal  MAP 
estimates  of  the  frequencies 


Figure  2:  Convergence  of  the  SA2ME  algorithm  towards  the 
marginal  MAP  estimates  of  the  frequencies 
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ABSTRACT 

The  paper  presents  new  developments  in  harmonic  analy¬ 
sis  associated  with  the  motion  transformations  embedded 
in  digital  signals.  In  this  context,  harmonic  analysis  pro¬ 
vides  motion  analysis  with  a  complete  theoretical  construc¬ 
tion  of  perfectly  matching  concepts  and  a  related  toolbox 
leading  to  fast  algorithms.  This  theory  can  be  built  from 
only  two  assumptions:  an  associative  structure  for  the  lo¬ 
cal  motion  transformations  expressed  as  Lie  group  and  a 
principle  of  optimality  for  the  global  evolution  expressed 
as  a  variational  extremal.  Motion  analysis  means  not  only 
detection,  estimation,  interpolation,  and  tracking  but  also 
propagators  motion-compensated  filtering,  signal  decompo¬ 
sition,  and  selective  reconstruction.  The  optimality  prin¬ 
ciple  defines  the  trajectory  and  provides  the  appropriate 
equations  of  motion,  the  selective  tracking  equations,  the 
selective  constants  of  motion  to  be  tracked,  and  all  the 
symmetries  to  be  imposed  on  the  system.  The  harmonic 
analysis  provides  new  special  functions,  orthogonal  bases, 
PDE’s,  ODE’s  and  integral  transforms.  The  tools  to  be  de¬ 
veloped  rely  on  group  representations,  continuous  and  dis¬ 
crete  wavelets,  the  estimation  theory  (prediction,  smooth¬ 
ing  and  interpolation)  and  filtering  theory  (Kalman  filters, 
motion-based  convolutions,  integral  transforms).  All  the 
algorithms  are  supported  by  fast  and  parallelizable  imple¬ 
mentations  based  on  the  FFT  and  dynamic  programming. 

1.  INTRODUCTION 

In  this  paper,  the  harmonic  analysis  on  motion  transforma¬ 
tions  is  built  on  the  actual  kinematics  as  they  take  place  in 
the  external  space  and  in  the  projections  on  sensor  arrays 
(Figure  1).  Eventually,  they  are  embedded  in  the  signals  to 
analyze.  From  that  point  of  view,  this  approach  fundamen¬ 
tally  differs  from  the  motion  models  currently  presented  in 
the  Literature  (see  in  [1]  and  all  the  references)  which  rely 
on  techniques  based  on  stochastic  processes,  statistics  and 
operations  research.  These  are  namely  block-matching,  pel- 
recursive  and  Bayesian  techniques.  As  a  major  drawback, 
these  techniques  are  totally  blind  to  the  underlying  mathe¬ 
matical  structures  of  the  spatio-temporal  transformations. 

The  author  wants  to  thank  Prof.  B.  Blank  in  the  Math. 
Dept,  for  helpful  discussions  and  Prof.  B.  K.  Ghosh  in  the  SSM 
Dept,  for  his  support  on  numerical  computations.  This  research 
work  was  supported  by  the  AFOSE  grant  No.  F49620-99- 1-0068. 


The  main  point  of  the  approach  proposed  in  this  pa¬ 
per  is  to  bring  differential  geometry,  mechanics  on  manifold 
and  harmonic  analysis  into  signal  analysis.  This  theory  pro¬ 
vides  the  actual  kinematics  and  relies  only  on  two  key  as¬ 
sumptions  that  can  be  summarized  as  follows:  a  Lie  group 
structure  (i.e.  an  associative  law  of  composition,  and  an 
identity  element  for  the  local  transformations)  and  a  prin¬ 
ciple  of  optimality  (for  the  global  evolution).  From  those 
two  key  points,  a  complete  machinery  of  theory,  analysis 
tools  and  fast  algorithms  can  be  constructed  in  such  a  nice 
way  that  all  the  concepts  perfectly  match  to  each  other. 
This  paper  presents  new  developments  on  this  important 
topic  that  cover  all  the  kinematics  embedded  in  any  spatio- 
temporal  real  and  complex  signals  and  apply  to  video,  radar 
and  sonar. 

The  construction  of  Lie  group  representations  (i.e.  the 
analyzing  functions  in  the  signal  space)  leads  naturally  to 
several  important  topics.  First,  this  leads  to  the  existence 
of  continuous  wavelet  transforms  with  frames,  tight  frames 
and  new  discrete  wavelets  placed  along  the  trajectories  which 
perform  spatio-temporal  and  motion-based  atomic  decom¬ 
positions,  expansions,  filtering  (prediction,  smoothing  and 
interpolation),  estimation  and  motion-selective  reconstruc¬ 
tions.  The  second  topic  deals  with  the  characters  of  the 
group  representations  to  define  new  special  functions  and 
integral  transforms  (IT)  which  generalize  the  Fourier  kernel 
for  the  new  kinematics  of  interest.  The  third  topic  proceeds 
with  the  adjunction  of  a  principle  of  optimality  based  on 
Euler-Lagrange  equations  and  define  the  existence  of  a  tra¬ 
jectory  and  a  tracking.  This  gives  rise  to  the  Partial  Differ¬ 
ential  Equations  (PDE)  as  equations  of  wavelet  and  signal 
motion  and  to  Ordinary  Differential  Equations  (ODE)  for 
tracking.  Fourth,  the  Green  functions  associated  with  these 
PDE’s  turn  out  to  be  the  previous  special  functions  related 
to  the  kinematics.  At  this  stage,  we  yield  a  global  analysis 
structure  with  the  construction  of  signal  propagators,  and 
motion-compensated  filters. 


2.  GROUP  REPRESENTATIONS,  WAVELETS 
AND  CONVOLUTIONS 

In  their  general  form,  the  Lie  group  representations  Tg  act¬ 
ing  upon  functions  ’F  6  L2(Rn  x  M,  dkdui)  read 

[?,$](£, w)  =  an/2  ei{“T+U)  ^(T^u,)]  (1) 
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where  g  is  an  element  of  the  group  G,  the  L2  normaliz¬ 
ing  factor  a"/2  originates  from  a  Radon-Nikodym  derivative 
and  provides  unitary  representations,  er(UT+k'b )  stands  for 
the  character  of  the  subgroup  of  spatio-temporal  transla¬ 
tions,  and  w)  is  the  left-group  action  of  g  €  G  in  the 

dual  space.  The  dual  space,  also  called  the  phase  space, 
is  the  Fourier  domain  denoted  ~  with  spatial  frequencies 
jc  6  Rn  and  temporal  frequency  u  G  E.  The  parameters 
a  €  IR+\{0},  6  €  Rn,  and  r  €  1  are  respectively  the  scale, 
the  spatial  and  temporal  translations. 

From  the  group  representations,  we  define  the  contin¬ 
uous  wavelet  transform  as  the  operator  Wy,  mapping  the 
function  S  €  H  =  L2(Rn  x  E)  into  functions  of  g  defined 
as 


and  moving  at  the  constant  velocity  v.  The  convolution  per¬ 
formed  along  this  displacement  (i.e.  along  the  trajectory) 
allows  the  reconstruction  of  the  still  signal  F{x,t).  This 
property  is  in  fact  a  reminiscence  of  the  motion-compensated 
filtering  developed  by  the  author  in  [3]  which  is  going  to  be 
generalized  in  this  work  by  the  introduction  of  IT’s.  Even¬ 
tually,  let  us  move  to  the  Fourier  domain  and  retrieve  the 
usual  condition  of  admissibility  for  the  Galilean  wavelet  as 
described  in  [9,  12,  13].  Proceeding  with  Equation  (5)  in 
the  Fourier  domain,  we  obtain 

F(k,u>)  =  F(k,uj)  f  f  |\k  (a  k  ,  w  —  k  v)  |2  fa 

Jr  J r+\{H}  a 

(7) 

which  leads  to  the  usual  condition  of  square-integrability  of 


[W^S](g)=  /  S(Z,t)[Tg*](x,t)dnxdt=<S\Tg<f’ > 
J  R»xR 

(2) 

This  inner  product  <  ,|.  >  would  remain  a  simple  correla¬ 
tor  between  [T9'F](»,  t)  and  S(x,t)  if  no  further  conditions 
were  imposed  on  the  unitary  and  irreducible  group  repre¬ 
sentations.  In  fact,  to  be  a  continuous  wavelet  transform, 
the  mapping  must  be  invertible  i.e.  that  there  exists  an 
operator  WIl  such  that  Wff 1  W7y,  =  Ih-  Ih  is  the  identity 
operator  in  the  Hilbert  space  of  observation  H.  This  means 
that  we  want  to  perfectly  reconstruct  the  signal 

S(x,t)  =  f  [W*S](g)  [Tg*](x,t)  dMg)  (3) 
Jg 

d\i  is  the  left-invariant  Haar  measure  calculated  on  the 
group  G.  The  condition  to  be  fulfilled  in  order  to  derive 
the  inverse  transform  is  known  since  1964  in  the  work  of 
Calderon.  Several  examples  considered  in  this  paper  are  de¬ 
fined  in  [4,  5,  6,  7].  The  simplest  case  is  the  affine-Galilean 

group  where  the  group  element  is  g  =  {b,  r,  v,  a}  where  v  € 
R71  is  the  velocity  vector  [6].  The  left-group  action  is  given 
by  (2-[x  —  b  —  v(t  —  r)],  t  —  r)  and  the  representation  in 
Equation  (1)  reads  [?«,$](£,  w)  =  an '2  ei{-UT+U)  *[k  J) 

with  k  =  ak,  ui  =  uj  +  k  •  v.  Let  us  examine  the  condition 
for  an  invertible  transform  in  the  affine-Galilean  case  with 
n  =  1  i.e.  k,r,i)6l  and  a  €  R+  \{0}  as  follows 


the  Galilean  wavelet  in  one-dimensional  space  and  time 


f  f  \9(k,u) |2 

Jr  Jr  W2 


dk  dui  =  1 


(8) 


See  references  [4,  5,  6,  7]  for  the  properties  and  applications 
of  the  Galilean  wavelets. 

The  construction  of  orthonormal  bases  proceeds  by  dis¬ 
cretizing  the  group  parameters  into  a  lattice.  The  spatio- 
temporal  lattice  i  is  easily  defined  as  a  generalization  of 
the  discretization  affine  group  a  =  a™ ,  b  =  ribb.a™ ,  v  = 
nvv*a™,  t  =  nrT*  with  a,  >  1,  and  b,,  v,,  t,  >  0  for  con¬ 
venience.  If  we  now  consider  the  regular  left-composition 
g~l(x,t)  =  ,  t  — r)  in  the  Galilean  case,  we  can 

mimic  the  case  of  the  affine  group  [6]  as  follows.  Let  a,  =  2 
andT9,4'(a;,t)  =  ( a~mx  -  mb ,  -  nvv*(t  -  nTrt)  , 

t  —  7i«  r„)  where  we  retrieve  the  well-known  orthonormal 
bases  <5/ m,p,q(x,t)  =  2~m,2'S>  (2~mx -p,t  -  g)  in  L2(R  x 
R)  at  p  =  mb.  +  nvv.nTr,,  q  =  nTr.  with  p,  q  €  Z.  Tech¬ 
nically,  we  have  deployed  the  usual  discrete  wavelets  de¬ 
fined  from  the  affine  group  along  spatio-temporal  transla¬ 
tions  that  correspond  to  the  motion  trajectories  at  constant 
velocity  [3]. 


3.  SPECIAL  FUNCTIONS  AND  INTEGRAL 
TRANSFORMS 


<F|r9*>  (TgV)(X)  dX,(g)  (4) 


which  becomes  after  some  easy  computations 


[[  if  f(») 

JrJr+  [JrxR  \  P  / 


(^a  *t,  $a) 


(x-y)-  v(t  -  p) 
t~  P 


dy  dp  |  ~~F~(5) 


LL {F’"  <'t'  *■  ?-)}( ’ )  ^ 


where  we  have  let  ^(a^f)  =  \k(— x,—  t)  and  ^ta(x,t)  = 
a”1'k(|,  <).  Let  us  make  an  important  remark  about  Equa¬ 
tions  (5)  and  (6):  the  introduction  of  a  non-conventional 
spatio-temporal  convolution  denoted  *v  is  in  fact  a  convo¬ 
lution  twisted  along  the  Galilean  transformation  i.e.  the 
translation  in  space  has  a  component  depending  on  time 


In  this  section,  we  proceed  one  step  further  on  the  represen¬ 
tations  and  focus  on  the  characters.  The  integration  of  the 
characters  leads  to  special  functions.  These  special  func¬ 
tions  naturally  define  the  kernel  of  an  integral  transform. 
This  procedure  can  performed  for  each  group  of  spatio- 
temporal  transformation.  Let  us  consider  an  important  ex¬ 
ample  known  as  rotational  motion  (described  in  [5]).  The 
set  of  parameters  is  given  as  G  —  {<?|<7  =  (b,r,6i,a)}  where 
&i  is  the  angular  velocity  Oi  6  E.  The  composition  law  is 
given  as  g  o  g  =  {b  +  aR(0ir)b  ,  r  +  r  ,  9i  +  #i,  aa  }; 
the  inverse  element  reads  as  p_1  =  {— a~1R(0iT)~1b,  — 
t,  —0i,  o-1}.  The  group  representations  T(p)'I'J  (k,  ui) 

in  polar  coordinates  b  — t  (r,  6b)  and  k  -4  (k,  6k)  with  n  =  2 
read 

0?  J  in  ein(8k  +  6b)  +  kT  sinM)  $(afc,  0k,  6i  ft) 

0) 
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with  x  =  Ob  —  Ok+diT  and  f2  =  .  The  characters  of  this 

representation  lead  to  the  special  functions  (Figure  4) 

i  f2w 

Jn(k )  =  ±  /  ei[Uu+ksinu]  du  (10) 

^  J  o 

which  axe  usually  NOT  Bessel  functions  except  when  9i 
takes  an  integer  values.  The  complexification  of  u  -¥  i  y 
gives  rise  to  hyperbolic  motion  instead  of  circular  rotations 
along  with  new  special  functions  as  in  (10)  with  instead  real 
exponential  and  sinh  functions.  These  special  functions  can 
be  also  easily  obtained  by  considering  'J'  as  a  Dirac  measure 
and  integrating  this  measure  along  the  trajectory.  This 
process  corresponds  to  “mechanics  of  moving  points”  and 
defines  the  spectral  signatures  of  objects  moving  according 
to  such  transformation.  The  usual  way  to  deduce  the  ODE 
which  admits  this  special  function  as  solution  is  to  calculate 
the  Laplace-Baltrami  differential  operator  on  this  group. 
Theorems  of  additivity  for  these  special  functions  can  be 
deduced  from  the  composition  of  the  translations.  In  this 
case,  it  reads 


/+oo 

•OO 


ri)  J[t-n2](k  r2)  dt  =  «/[n1-n2][fc(ri  +r2)] 

(11) 

Equation  10  leads  to  “Hankel-like”  integral  transforms 


Wof](k) 


poo 

=  /  /, 

Jo 


(r)  Jh(fc  r)  r”  1  dr  (12) 


The  same  procedure  and  computations  can  be  done  on  all 
the  groups  dealing  with  spatio-temporal  transformations 
defined  in  [4,  5,  6,  7].  Examples  on  the  Galilean  group 
[6,  7]  proceed  with 

fR  e~iuT  dr  =  5{yJ  +  fct))  e«(—0f 

on  the  acceleration  group  [4]  with 

fR  e~iwT 

=  es4  eikb  eik^r2  «*22’ 

where  72  €  M,  on  the  deformations  [8]  with 

In  *-iUT  e~ike“1>xdr  =  ±  F (-<£) 

where  si  6  1  and  F()  is  the  usual  Gamma  function. 


4.  PRINCIPLE  OF  OPTIMALITY  AND 
TRACKING 


According  to  calculus  of  variations,  the  motion  between 
times  ti  and  t2  coincides  with  the  extremal  of  the  func¬ 
tional  J 

rt2 

=  0  with  J=  L[q(t),  q(t),q, . . . ,  q  t]dt,  (13) 
Jti 


where  5  stands  for  the  variation.  The  application  of  the 
optimal  variational  principle  in  Equation  (13)  is  equivalent 
to  writing  the  so-called  Euler-Lagrange  equation  [7].  The 
trajectory  is  then  uniquely  defined  if  the  initial  state  q( 0)  = 
qo  of  the  object  is  known.  At  the  extremum,  denoted  by 
the  subscript  »,  the  Euler-Lagrange  equation 


d  dL  _  dh_ 
dt  d'qt  dq. 


(14) 


This  Euler-Lagrange  equation  generalizes  quite  easily  and, 
moreover,  allow  us  to  derive  the  equation  of  wavelet  motion 
that  optimizes  the  action  J.  If  we  consider  the  Galilean  case 
with  one-dimensional  space  with  q(r)  =  b(r)  and  q(r)  = 
b(r)  and  the  inner  product  2  as  Lagrangian,  then  (14)  be¬ 
comes 

d  d  <  jWf  >  _  d  <  >  =  Q  (15) 

dr  db  db 

It  is  convenient  to  expand  the  total  differential.  The  condi¬ 
tions  to  introduce  the  operator  in  the  integral  are  fulfilled. 
One  solution  of  this  IT  is  that  the  kernel  be  equal  to  0. 
This  gives  a  PDE  on  \l>(afc,  to  —  kb)  i.e.  the  motion  equation 
for  the  wavelet.  In  the  Fourier  domain  the  PDE  operator 

is  siven  by 


( bk  +  u))( 


duo 


l<L\-h\Ldud 

i>dk  1  Li>2  dk  dui 


and  the  PDE  by  Aw,  'f(afc,w  —  kb)  =  '5f(ak,u)  —  kb). 
There  are  many  applications  out  of  this  procedure  which 
can  be  similarly  drawn  for  each  spatio-temporal  group.  Two 
are  examined  below  and  a  third  in  Section  5. 

If  we  consider  a  wavelet  tuned  on  parameter  gi  and 
the  Dirac  measure  on  parameter  g2,  the  partial  differential 
operator  ^  becomes  n(t,lll)a, 4l.fc)W) 


(vik+ui)( 


d 


V2  —  Vi  dk 


)+*  V\ 


1  d  t  1  d2 ' 
(v2  —  Ui)2  dk  V2  —  ui  dk2 
(17; 


and  the  PDE  becomes  an  ODE  i.e.  the  tracking  equation 
H(WJ -,k,u)  ^((ak}-k(v 2  -  Vi))  =  $(ofc,-/i:(t)2  -  m)). 

Let  us  consider  a  Galilean  Morlet  wavelet  [6,  7]  applied 
to  a  Dirac  measure  in  pure  translation  motion  at  constant 
velocity.  The  signal  taken  as  a  Dirac  measure  on  a  trans¬ 
lational  trajectory  is  given  by  S{x,t)  =  5[x  —  vt].  The 
Lagrangian  $[&,  T,v\ka,wa\  —<  'k9|S'  >  reads  after  inte¬ 
grating  the  inner  product,  we  get  <!>[&,  r,  v\  ko,  wo]  = 


y^27T  eliko(l>-bT)]  g[- J{(6T-i))2+T2}]e[-tw0r] 

g[J{(fcof-*:o&+wo)2-((^-b)(i'T-6)-r)2}] 

e(i{(*:ot>-fcot+wo)((ti-6)(fcr-6)-r)}] 


(18) 


k0  and  ojo  axe  the  coordinates  of  the  wavelet  shift  in  the 
Fourier  domain.  The  contribution  of  all  the  partial  deriva¬ 
tives  involved  in  the  Euler-Lagrange  equation  namely  leads 
to  an  ODE  in  form  of  a  product  of  F,  which  is  a  complex 
function  of  the  constant  of  motion  br  —  b  and  2b  =  —hr, 
with  the  Lagrangian  $[6,  r,  v\  fco ,  cc^o] 

F[br  —  b,  br  —  2b]  4>[6,  r,  v,  ko,  wo]  =  0  (19) 


such  that  F(0, 0)  =  0.  The  ODE  vanishes  when  v  =  b,i>T  — 
b  =  0,  br  —  2b  =  0  and  w 0  =  0,  fco  /  0.  Therefore,  we  have 
verified  that  the  tracking  addresses  the  correct  constant  of 
motion  b  =  br  and  b  =  br  +  | br 2  meaning  that  the  system 
can  track  objects  at  constant  velocity  and  constant  second- 
order  acceleration.  The  tracking  requires  some  symmetry 
in  the  wavelet  i.e.  that  the  still  wavelet  must  be  located  in 
the  plane  ui  =  0  with  fco  /  0.  These  practical  results  have 
algorithmic  importance  as  pictured  in  [7]. 
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5.  MOTION-BASED  FILTERING 


This  section  extends  the  concept  of  velocity  filtering  origi¬ 
nally  defined  by  Fleet  and  Jepson  in  [1],  studied  by  Dubois 
in  [2]  for  all  the  categories  of  motion  within  the  approach 
pursued  in  the  previous  section.  To  reach  that  goal,  we 
introduce  integral  transforms  whose  kernels  are  motion- 
specific  Green  functions.  In  the  following,  it  is  demon¬ 
strated  that  the  motion-specific  Green  functions  can  be 
equivalently  derived  from  the  characters  of  the  group  repre¬ 
sentations  in  Section  (3)  or  from  the  fundamental  solution  of 
the  PDE  of  the  wave  equation  of  Section  (4).  This  leads  to 
convolutional  integral  transforms  twisted  along  the  motion 
transformations  as  presented  in  Section  (2).  The  interest¬ 
ing  point  of  this  approach  comes  from  the  Equations  of  the 
wavelet  motion  themselves  (16)  expressed  in  the  Fourier  do¬ 
main.  As  a  result  of  the  existence  of  the  term  ^  in  A,  the 
PDE  can  be  re-written  in  the  Fourier  domain  in  the  form 
of 

A  'Sf(g~1X)  =  where  X  =  (k,u>)  (20) 

with  an  eigen  value  at  1.  The  Green  function  G  for  operator 
A  is  the  distribution  which  satisfies  A  G(g~1X)  =  S(g~1X). 
The  Green  function  is  the  Dirac  S(g~1X)  itself.  The  Green 
function  known  as  the  fundamental  solution  of  the  PDE  as 
in  Equation  (20).  If  the  operator  A  is  injective  then,  the  in¬ 
verse  A-1  exists  and  provides  a  convolutional-type  integral 
transforms  whose  kernel  is  the  Green  function  i.e. 


If  g  =  e  the  identity  element,  we  retrieve  the  usual  Fourier 
transform  with  kernel  K(k,u)  =  5(w)eiks.  This  procedure 
defines  for  each  kind  of  motion  the  kernel  K(b,io;m;x,t) 
that  particularizes  the  usual  Fourier  transform  for  the  mo¬ 
tion  group  of  interest,  m  denotes  the  current  motion  pa¬ 
rameter.  If  the  Dirac  measure  is  transformed  into  a  contin¬ 
uous  wavelet  with  compact  support,  then  the  calculation 
of  9(k,u;m)  animated  of  motion  m  from  its  still  cognate 
^(x,t)  becomes  an  integral  transform  with  kernel  K.  Let 
us,  for  example,  consider  the  kernel  of  accelerated  wavelets 
as  propagator  presented  in  section  (3)  and  integrate  with 
a  still  Morlet  wavelet  [6,  7],  this  yields  the  propagated 
wavelets  for  second-order  accelerations 

'Pfo(k,u))  =  (27 r)  e‘T 

(23) 

Moreover,  as  the  function  T  can  now  be  scaled  to  extend 
the  results  from  the  “point  mechanics”  towards  the  “object- 
based  mechanics”  as  follows 

9(k,aj;m,a,ao)  =  /  X(k,u};m;x,t)^(x,t;a,ao)dnxdt 
J  RxR" 

(24) 

We  have  reach  so  far  the  ability  to  generate,  cancel  or  mo¬ 
dify  analyzing  wavelets  as  well  as  moving  patterns. 

6.  CONCLUSIONS  AND  APPLICATIONS 


mm) 


G(x,£)  f(x)dx 


These  kernels  are  meaningful  and  remind  the  propagators 
associated  with  Green  functions  of  the  Schrodinger  equa¬ 
tions.  The  meaning  of  Equation  (21)  and  of  the  wavelet- 
based  reproducing  kernels  [7]  leads  to  the  following  duality 
of  the  motion  analysis. 

(1.)  If  the  still  version  of  a  signal  (wavelet,  filter  or 
stochastic  process)  f(x)  is  known,  then  reproducing  ker¬ 
nel  integral  transform  provides  all  the  moving  version  in 
(x,t)  or  in  (k,u).  These  integral  transforms  generate  the 
whole  family  of  analyzing  signals,  wavelets  or  processes  in 
the  observing  space  i2(Kn  x  M,  dnxdt).  This  allows  spatio- 
temporal  filtering,  interpolating,  and  predicting  along  a  tra¬ 
jectory. 

(2.)  If  the  animated  version  of  a  signal  is  known,  then 
Equation  (21)  is  a  filter  that  compensates  the  signal  from 
a  given  motion  and  gives  rise  to  the  still  signal.  This  is 
motion-compensation  filtering.  The  advantage  of  such  ap¬ 
proach  is  that  the  classical  affine  wavelet  analysis  and  pro¬ 
cessing  may  then  be  applied  on  the  compensated  signal 
(for  coding  purpose  as  in  [3]).  This  section  brings  a  more 
general  point  of  view  on  the  motion  analysis  presented  in 
[3]  where  motion  compensated  filtering  was  performed  by 
building  the  trajectories  within  the  signal  and  applying  dis¬ 
crete  wavelets  along  the  assumed  trajectories. 

Let  us  then  revisit  Section  (3)  and  compute  the  Fourier 
transform  of  a  Dirac  measure  on  a  trajectory 


This  paper  has  shed  light  on  a  novel  motion  analysis  based 
on  a  group-theoretic  approach.  Let  us  consider  the  pro¬ 
jection  of  moving  patterns  on  sensor  arrays  which  creates 
the  most  important  part  of  all  the  acceleration  components 
embedded  in  signals.  The  traffic  sequence  (Figure  2)  is  an 
example.  The  projection  takes  place  within  the  cone  of  sen¬ 
sor  visibility  (Figure  1)  is  a  homothety  (i.e.  a  re-scaling). 
The  projection  may  be  modelled  as  an  orthogonal  projec¬ 
tion  composed  with  a  scaling.  Let  us  define  the  2-axis  or¬ 
thogonal  to  the  sensor  plane  and  the  x—y  axes  in  the  sensor 
plane.  The  motion  captured  in  the  sensor  plane  is  obtained 
after  a  projection  on  planes  n0,  IIi,  JJ2  parallel  to  sensor 
at  time  r  =  0,  1,  2  and  a  homothety  that  rescales  the  pro¬ 
jection  down  to  the  plane  of  the  sensor  (Figure  1).  Let  us 
denote  W  the  width  of  the  rigid  object  and  So  the  size  of 
the  object  captured  by  the  camera.  The  scale  ao  =  is 
observed  from  plane  IIo  at  time  r  =  0.  At  time  r  =  n, 


the  size  perceived  from 

n  —  w  _  w  _ 

”  Sn  So(l-^-r)  ~ 


plane  n„  by  the  camera  is  given  by 

:TT%7=ao  [l  +  *fr+(^)V+...] 


=  ao[l  +  air  4-  a2r2  +  . . .  +  anrn  +  ...].  The  series  is  con¬ 


vergent  if  l^rl  <  1  i.e.  with  the  physical  observation.  The 


components  of  translation,  velocity  and  accelerations  along 
x  and  y  axis  are  rescaled  with  the  ratio  -r-  =  —  =  . 

bo  v0  d 
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Figure  3:  Estimation  of  the  parameters  and  do  by  com¬ 
puting  the  square  modulus  (energy)  of  the  wavelet  transform 
as  in  [8]:  |  <  T{g)<S> \s  >  |2  =  F(o0,  *f)  is  estimated  in 
the  scene  displayed  in  Figure  2.  Two  local  maxima  are  de¬ 
tected  and  displayed  at  =  0.5s-1,  ooi  =  2.6)  and 
(HiZ  =  0.38s-1,  a02  =  1.8)  standing  for  the  fore  and  back¬ 
ground  car  respectively.  If  we  assume  d\  =  40  m  for  the 
foreground  car  and  a  rate  of  25  images  per  second,  then  we 
can  estimate  the  relative  approaching  velocity  component  at 
d*!  =  72  km/h  (45  miles/h).  For  the  background  car,  if  we 
assume  d,2  —  50  m,  then  vZ2  =  68.4  km/h  (42.7  miles/h). 
Let  us  remark  that  the  camera  is  traveling  towards  the  cars; 
therefore,  both  velocities  correspond  to  relative  values. 


ANGULAR  VELOCITY  CHARACTERISTIC  FUNCTION;  absolute  value 


Figure  4:  Spatio-temporal  special  function  associated  with 
the  rotational  motion.  The  sketch  is  performed  on  sections  at 
constant  ui,  the  angular  velocity  =  1.5  radian/image. 
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ABSTRACT 

In  the  classical  methods  for  blind  channel  identification 
(Subspace  method,  TXK,  XBM)  [1,  2,  3],  the  addi¬ 
tive  noise  is  assumed  to  be  spatially  white  or  known  to 
within  a  multiplicative  scalar.  When  the  noise  is  non¬ 
white  (colored  or  correlated)  but  has  a  known  covari¬ 
ance  matrix,  we  can  still  handle  the  problem  through 
prewhitening.  However,  there  are  no  techniques  presently 
available  to  deal  with  completely  unknown  noise  fields. 

It  is  well  known  that  when  the  noise  covariance  matrix 
is  unknown,  the  channel  parameters  may  be  grossly  in¬ 
accurate.  In  this  paper,  we  assume  the  noise  spatially 
correlated,  and  we  apply  this  assumption  for  blind  chan¬ 
nel  identification.  We  estimate  the  noise  covariance 
matrix  without  any  assumption  except  its  structure 
which  is  assumed  to  be  a  band-Toeplitz  matrix.  The 
performance  evaluation  of  the  developed  method  and 
its  comparison  to  the  modified  subspace  approach  (MSS) 
[4]  are  presented. 

1.  INTRODUCTION 

One  common  problem  in  signal  transmission  through 
any  channel  is  the  additive  noise.  In  general,  additive 
noise  is  generated  internally  by  components  such  as  re¬ 
sistors,  and  solid-state  devices  used  to  implement  the 
communication  system.  This  is  sometimes  called  ther¬ 
mal  noise  or  Johnson  noise.  Other  sources  of  noise  and 
interference  may  arise  externally  to  the  system,  such 
as  interference  from  the  other  users.  When  such  noise 
and  interference  occupy  the  same  frequency  band  at  the 
desired  signal,  its  effect  can  be  minimized  by  proper  de¬ 
sign  of  the  transmitted  signal  and  its  demodulator  at 
the  receiver.  The  effects  of  noise  may  be  minimized  by 
increasing  the  power  in  the  transmitted  signal.  How¬ 
ever,  equipment  and  other  practical  constraints  limit 
the  power  level  in  the  transmitted  signal  [5]. 

This  work  is  supported  by  Alexander  von  Humboldt- 
Stiftung,  Bundesrepublik  Deutschland. 


The  classical  model  used  in  communication  systems 
supposes  on  the  one  hand  that  the  power  of  the  noise 
is  identical  on  each  sensor,  and  on  the  other  hand  that 
there  is  no  noise  space/time  correlation.  However,  this 
situation  is  seldom  met,  which  involve  a  clear  degra¬ 
dation  of  the  performances  of  the  subspace  methods. 
Here,  we  recall  some  well-known  methods  which  treat 
the  noise  problem  in  array  processing  for  direction-of- 
arrival  estimation.  In  fact,  in  recent  years,  there  has 
been  a  growing  interest  in  the  problem  of  techniques 
with  the  objective  of  decreasing  the  signal  to  noise  ra¬ 
tio  resolution  threshold  or  the  spatially  colored  noise 
[6,  7,  8,  9,  10].  The  ambient  noise  is  unknown  in  prac¬ 
tice,  therefore  modeling  or  its  estimation  are  necessary. 
The  methods  developed  for  this  problem  are  very  few 
and  there  are  no  definitive  solution.  There  are  some 
practical  methods;  in  [11]  two  methods  are  obtained  by 
optimization  of  criterion  and  by  using  AR  or  ARMA 
modeling  of  noise.  In  [7]  the  spatial  correlation  matrix 
of  noise  is  modeled  by  the  known  Bessel  functions.  As 
in  [6]  the  ambient  noise  covariance  matrix  is  modeled  by 
a  sum  of  hermitian  matrices  known  up  to  multiplicative 
scalar.  In  [8]  this  estimate  is  obtained  by  measuring  the 
array  covariance  matrix  when  no  signals  are  present. 
This  procedure  assumes  that  the  noise  is  not  changing 
in  function  of  time,  which  is  not  fulfilled  in  several  do¬ 
main  applications.  Another  possibility  [8]  arises  when 
the  correlation  structure  is  known  to  be  invariant  un¬ 
der  a  translation  or  rotation.  The  so-called  differencing 
covariance  technique  can  be  then  applied  to  reduce  the 
noise  influence.  In  this  method,  two  identical  trans¬ 
lated  and/or  rotated  measurements  of  the  array  covari¬ 
ance  matrix  are  required  and  assumes  the  invariance  of 
the  noise  covariance  matrix,  while  the  source  signals 
change  between  the  two  measurements.  The  estimate 
noise  covariance  matrix  is  eliminated  by  a  simple  sub¬ 
traction.  Furthermore,  this  method  cannot  be  applied 
when  the  source  covariance  matrix  satisfies  the  same 
invariance  property  or  when  only  one  measurement  is 
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available.  In  [7]  a  particular  modeling  structure  noise 
covariance  matrix,  which  takes  into  account  the  charac¬ 
teristic  noise  relative  to  its  origins,  is  given.  Recently, 
a  maximum  posteriori  approach  (MAP)  has  been  de¬ 
veloped  in  [10];  this  method  can  only  be  applied  in  the 
case  of  a  linear  array.  In  [9],  the  method  called  “In¬ 
strumental  Variable”  (IV)  is  used  to  reduce  the  noise 
without  estimated  it;  this  estimator  considers  that  the 
noise  is  temporally  independent.  One  technique  based 
to  the  MDL  criterion  has  been  developed  in  [12]  for  de¬ 
tection  and  localization  of  the  signals  in  the  presence 
of  unknown  noise;  this  estimator  is  asymptotically  bi¬ 
ased  [12].  However,  the  study  of  the  noise  for  blind 
channel  identification  is  very  limited.  In  [4],  a  modi¬ 
fied  subspace  method  (MSS)  for  blind  identification  in 
the  presence  of  unknown  correlated  noise  has  been  pre¬ 
sented,  indeed  one  use  some  matrices,  for  a  time  lag 
when  the  noise  is  absent.  The  object  of  this  correspon¬ 
dence  is  to  improve  the  blind  channel  identification  in 
the  presence  of  a  correlated  noise  by  whitening  the  re¬ 
ceived  data.  The  noise  is  assumed  spatially  correlated. 
The  structure  of  the  paper  is  as  follows.  In  the  sec¬ 
tion  II,  we  present  the  studied  problem  and  in  section 
III,  we  describe  the  noise  covariance  matrix  model  used 
in  this  study  and  its  estimation  by  the  proposed  algo¬ 
rithm,  we  apply  the  noise  estimation  for  blind  channel 
identification  using  the  subspace  method.  We  present, 
in  the  section  IV,  some  simulation  results  and  perfor¬ 
mance  comparisons. 

2.  PROBLEM  FORMULATION 

Consider  L  FIR  channels  driven  by  a  common  source. 
The  output  vector  of  the  ith  channel  can  be  written  as: 

ri(k)  =  H{i)s(k)  +  ni(k),  (1) 

where,  r^(fc)  is  the  output  sequence  of  the  ith  chan¬ 
nel,  s i(k)  is  the  input  sequence  and  nj(fc)  is  the  noise 
sequence  on  the  ith  channel. 
ti(k)  -  [ri(k)  ri(k- (- 1)  ...  r^k  +  N  -  1)], 

s(k)  =  [s(k  —  M)  s(k  —  M  +  1)  ...  s(k  +  M  —  1)], 

ni(fc)  =  [rij(fc)  nj(fc  +  l)  ...  nj(fc  +  N  —  1)]. 

4°  4°  h<$  .  0  \ 

0  h{0l)  fcW  ...  ...  0 

.  .  .  .  .  5 

o  ...  o  4°  4°  •••  hM* 

where,  is  the  impulse  response  of  the  ith  channel, 
M  is  the  maximum  order  of  the  L  channels  and  N  is 
the  width  of  the  temporal  window.  is  of  dimension 
(N  x  (AT  +  Af)). 


Then  we  have, 


r  (k)  =  Hs(k)  +  n  (k),  (2) 


f  r x(fc)> 

(%l\ 

( ni(fc)\ 

=  ;  U(fc)+ 

: 

\vL(k)j 

\nLJ 

\nL(k)J 

The  matrix  H  is  known  as  the  (LN  x  (N+M))  filtering 
matrix,  which  has  the  full  rank  ( N  +  M)  under  the 
following  assumptions',  the  L  channels  do  not  share  a 
common  zero  and  N  >  ( M  -f  1). 

The  blind  identification  problem  is  to  find  H  from  the 
sequence, 


{r(fc)  for  k  =  1,2,...,  FT}. 


The  subspace  method  [1]  exploits  the  sample  covariance 
matrix  of  all  channel  outputs:  T  —  E  [rr+], 

1  * 

T  =  —  ^r(fc)r+(fc),  where  K  is  the  number  of  sam- 


fc=i 


pies  and  +  denotes  the  conjugate  transpose.  Assume 
that  the  signals  and  the  additive  noise  are  independent, 
stationary  and  ergodic  zero  mean  complex  valued  ran¬ 
dom  processes,  and  as  K  becomes  large,  this  matrix 
has  the  asymptotical  structure:  T  =  /HTS'H+  +  T„, 
with  r„  =  E  [nn+  ]  the  noise  covariance  matrix  and 
Tg  —  E  [ss+]  is  the  signal  covariance  matrix. 

The  goal  of  blind  channel  identification  and  equaliza¬ 
tion  is  to  identify  TL  (channel  identification)  and  to  es¬ 
timate  s(fc)  from  r(k)  (channel  equalization). 

The  subspace  blind  channel  identification  procedure  [1] 
consists  on  the  estimation  of  the  (LN  x  1)  vector  h 
of  channel  coefficients  from  the  observation  vector.  In¬ 
deed,  this  approach  is  based  on  the  eigendecomposition 
of  the  data  covariance  matrix, 


r  =  [u. 


un]H 


The  subspace  method  yields  an  estimate  %  of  H  by 
solving  the  equation:  U+Tf  =  0,  in  a  least  square  sense 
(where  H  is  subject  to  the  same  structure  as  Ti).  This 
estimate  is  uniquely  (up  to  a  constant  scalar)  equal  to 
H .  FYom  [1],  we  have: 


u +n  =  h +un  =  o, 


(3) 


with  Un  is  the  (L(M  +  1)  x  (N  +  M ))  matrix  obtained 
by  stacking  the  L  filtering  matrices  . 

Un  =  \lAn^T ,  where, 
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uP  = 


and  h=  [h(0),...,h<i  x)],  with  hw  =  . 

The  optimization  system  derived  in  [1]  is: 


h  =  arg  min  h+U33h, 
IWNi 


where, 


LN—M—N—l 


U33  =  £  w<°w«+ 


is  the  filtering  noise  projection  matrix. 

The  noise  is  assumed  Gaussian,  complex  and  spatially 
correlated.  Its  real  and  imaginary  part  are  supposed 
independents,  Gaussian  with,  E[n{\  =  0,  £l[n;n^]  =  0, 
and  £’[n;nif]  =  Tn.  T„  is  the  noise  covariance  ma¬ 
trix,  the  superscripts  and  “+”  denote  conjugate 
and  conjugate  transpose,  respectively.  We  consider  the 
noise  covariance  matrix  is  band,  defined  by: 


T  n(i,m) 


i  —  m  |>  K 

i  —  m  |<  K  and  i  ^  m 
i  —  m 


Where  pi  =  pi  +  jpi,i  =  1, . . .  ,K,  pi  are  complex  vari¬ 
ables,  j2  =  —1,  a2  are  the  noise  variance  at  each  re¬ 
ceiver,  and  K  is  the  spatially  noise  correlation  length. 


(°\ 

P12  •  •  • 

PlK 
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Two  manners  to  give  back  observation  covariance  ma¬ 
trix  a  noise-free  matrix:  either  by  subtraction  of  the 
noise  covariance  matrix,  'HTS'H+  =  T  —  T„;  then  we 
have  then  a  “clean”  observation  covariance  matrix;  how¬ 
ever,  we  can  obtain  a  negative  matrix  if  Tn  is  bad- 
estimated. 

Or  by  whitening;  in  this  case  we  find  again  the  classical 
model  of  communication  systems  ^Tn  3  ITn  3  j .  How¬ 
ever,  this  processing  is  most  robust  but  needs  more 
computational  load. 

From  the  data  matrix  T  =  'HTgH++Tn,  the  goal  of  the 
first  part  of  this  paper  is  to  estimate  the  noise  covari¬ 
ance  matrix  Tn  and  in  the  second  part,  we  estimate, 


blindly,  H  from  the  “clean”  obtained  matrix  [HT  g'H+] 
using  the  subspace  method  [1]. 

3.  BLIND  NOISE  ESTIMATION  (BNE) 

In  many  applications  such  as  communication  systems, 
it  is  reasonable  to  assume  the  correlation  is  decreasing 
along  the  receivers.  That  is  a  widely  used  model  for 
a  colored  noise.  The  correlation  rate  p  is  decreasing 
when  the  distance  between  two  receivers  increases. 

In  this  study,  we  consider  the  noise  covariance  matrix 
band-Toeplitz  with  the  diagonal  values  are  decreasing, 
so-called  decreasing  band-Toeplitz.  It  is  the  unique  as¬ 
sumption  to  estimate  the  noise  covariance  matrix. 

The  BNE  algorithm  from  the  noise  covariance  matrix 
estimation  is  summarized  in  the  following  steps: 

Step  1:  -  Estimation  and  eigendecomposition  of  the  re- 

-  1  T 

ceived  covariance  matrix  T;  T  =  —  with  T  is 

1  t=i  ^ 

the  number  of  independent  realizations;  T  =  UAU+, 
where,  A  =  diag[ Ai, . . . ,  A^jv],  andU  =  [ui,U2, . . .  ,U£#]; 
A i  and  u,-  are  the  eigenvalues  and  the  eigenvectors  of 
the  observation  covariance  matrix,  respectively. 

-  Initialization  of  the  noise  covariance  matrix  :  Tn  =  0. 
Step  2:  -  Calculation  of  the  matrix:  W n+m  =  USAS'  , 
with  Us  =  [ui,u2,...  ,Ujv+m]  is  the  matrix  of  (N+M) 
eigenvectors  corresponding  to  the  (N+M)  eigenvalues, 
and  As  =  diag[\i,. . . ,  Ajv+m]  is  the  matrix  of  (N+M) 
eigenvalues. 

-  Calculation  of  the  matrix:  A  =  W v+AfW++M. 

Step  3:  Calculation  of:  =  KJband  T  —  Aj ,  with 

r(n]  is  the  band  noise  covariance  matrix  at  first  iter¬ 
ation,  and  KJband{.]  designates  the  matrix  band  with 
(K  +  1)  is  the  bandwidth. 

Step  J:  Eigendecomposition  of  the  matrix:  |r  —  Tn  = 

VAV+.  The  new  matrices  A  and  Tn^  are,  again,  es¬ 
timated  in  step  2  and  step  3.  These  iterations  are  re¬ 
peated  until  the  improvement  of  Tn^. 

Stop  test:  The  algorithm  is  stopped  when  the  distance 

between  and  Tn+^  becomes  less  then  some  value 
e.  We  define  the  distance  between  and  T^,+1^  as 
||  r£+1)  —  rW  ||j?,  the  Frobenius  norm  of  the  matrix 
j-f  (i+i)  _  f  W) 

The  estimate  noise  covariance  matrix  Tn  is  obtained 
when  the  algorithm  is  stopped. 

The  matrix  Tn  is  used  to  “denoise”  the  received  data. 

In  fact,  the  free-noise  received  covariance  matrix  is 
f  =  f  -  fn  or  f  =  ^f^3ffn  3^-  This  ” clean”  matrix 
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is  used  to  estimate  the  channel  matrix.  In  order  to  eval¬ 
uate  its  performance,  we  apply  the  subspace  method 
[1].  Indeed,  Moulines  et  al.  [1],  showed  that  if  the 
subchannels  don’t  share  common  zeros,  h  is  uniquely 
determined  by  the  noise  subspace  Un,  the  subspace  es¬ 
timator  is  given  by: 

h  =  arg  h+i/33h,  where  U33  is  the  filtering  noise 

projection  matrix  estimated  from  the  ‘’clean”  data  co- 
variance  matrix.  This  estimator  does  not  require  the 
knowledge  of  the  source  covariance  as  long  as  T3  >  0. 
We  also  compare  our  result  to  the  modified  subspace 
(MSS)  method  [4]. 

4.  PERFORMANCE  EVALUATION 

To  demonstrate  the  efficiency  of  the  proposed  algo¬ 
rithm,  some  computer  simulations  have  been  conducted. 
In  the  following  simulations,  we  take  the  parameters 
described  in  [1],  in  fact  the  number  of  virtual  channels 
is  L  =  4;  the  width  of  the  temporal  window  is  N  —  10; 
the  degree  of  the  ISI  is  M  =  4,  the  channel  coefficients 
are  given  by  [1]: 


^0 

hi 

h2 

h3 

-0.049+0.359j 

0.443-0.0364j 

-0.211-0.322j 

0.417-0.030j 

0.482+0.569j 

1 

-0.1 99-0.9 18j 

1 

-0.556+0.587j 

0.921-0. 194j 

1 

0.873-0. 145j 

1 

0. 1 89-0.208j 

-0.284-0.524j 

0.285+0.309j 

-0. 171 +0.06  Ij 

-0.087-0.054j 

0.1 36-0. 190j 

-0.049+0.1 61 j 

Table  1:  Four  virtual  complex  channels. 

for  all  these  simulations,  the  number  of  data  samples 
used  to  estimate  each  h  ranges  from  100  to  1000  in 
steps  of  100. 

The  root  mean-square  error  ( RMSE )  defined,  below, 
is  employed  as  a  performance  measure  of  the  input  es¬ 
timates: 

RMSE  =  ^  Sfci  ||  Hj  —  H  ||2,  where  K  is  the 

number  of  trials  (100  in  our  cases)  and  H,-  is  the  esti¬ 
mate  of  the  inputs  from  the  ith  trial. 

The  signal  to  noise  ratio  (SNR)  is  defined  as: 

SNR  =  101og10  We  define  the  Frobenius 

norm  of  estimation  error  (EE)  of  the  noise  covariance 
matrix  as :EE  =||  r  -  (HraH+ +  r„)  ||F. 

We  compare  the  presented  algorithm  with  the  exist¬ 
ing  methods  such  as  the  modified  subspace  approach 
(MSS)  [4].  This  comparison  is  based  on  the  root  mean 
square  error  of  the  channel  matrix  estimates.  We  recall, 
this  approach  in  the  following:  Let  T(r)  =  71J(t)'H+  + 
r„(r),  where  J(r)  is  the  (N  +  M)  x  (N  +  M)  shift 
matrix.  In  [4],  one  assumes  that  rn(r)  =  O  as  long 
a s  t  >  N.  Therefore,  we  have  the  relation  T(t)  = 
'HJ(t)'H+  for  t  >  N.  At  the  time  lag  r  =  N,  T(N)  = 


R  (J(1V)  +  J(AT)+)  'H+,  the  matrix  T(N)  is  used  to  es¬ 
timate  the  channel  parameters. 

The  Figures  (la  and  lb)  present  the  root  square-mean 
error  (RMSE)  of  the  parameters  estimates  for  a  band- 
Toeplitz  noise  covariance  matrix  and  the  FYobenius  norm 
of  estimation  of  error  (EE)  of  the  noise  covariance  ma¬ 
trix  versus  number  of  samples. 


Figure  1:  (a)  Root  square-mean  error  (RMSE)  of  the  parameters 
estimates  (band-Toeplitz  noise  covariance  matrix),  (b)  Frobenius 
norm  of  estimation  of  error  (EE)  of  the  noise  covariance  matrix 
(band-Toeplitz  noise  covariance  matrix)  versus  number  of  samples 


In  the  case  of  a  band  noise  covariance  matrix  with  a 
correlation  length  K  —  4,  we  have  Figures  (2a  and  2b), 
versus  SNR  between  0  dB  to  16  dB. 


Figure  2:  (a)  Root  mean-square  error  of  the  parameters  esti¬ 
mates  (band-Toeplitz  noise  covariance  matrix  ( K  =  4))  versus 
SNR.  (b)  Frobenius  norm  of  the  estimation  of  error  (EE)  of  noise 
covariance  matrix  as  a  function  of  number  of  iterations. 


We  study,  the  influence  of  the  correlation  length  versus 
the  error  of  the  noise  covariance  matrix  estimation  Fig¬ 
ure  (3a)  and  the  channel  parameters  Figure  (3b).  In 
fact,  the  correlation  length  varies  between  K  =  1  and 
K  =  4,  with  SNR  =  3  dB. 

The  normalized  error  (NE)  is  defined  by,  NE  = 

We  consider  the  noise  covariance  matrix  band,  and  we 
estimate  the  normalized  error  and  the  Frobenius  norm 
versus  of  different  scenarios  of  the  channel  matrix  (Fig¬ 
ures  (4a  and  4b). 

These,  simulations  show  that  the  processing  which  con¬ 
sists  to  first  estimation  of  the  noise  covariance  ma¬ 
trix  and  prewhitening  the  observation  has  many  ad¬ 
vantages,  is  more  efficient  then  the  modified  subspace 
(MSS)  approach  [4].  The  use  of  the  denoised  subspace 
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Figure  3:  (a)  Root  mean-square  error  of  the  parameters  esti¬ 
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ABSTRACT 

A  new  technique  is  proposed  for  robust  multiuser  detec¬ 
tion  in  the  presence  of  non-Gaussian  ambient  noise.  This 
method  is  based  on  minimizing  a  certain  cost  function  (e.g., 
the  Huber  penalty  function)  over  a  discrete  set  of  candi¬ 
date  user  bit  vectors.  The  set  of  candidate  points  are  cho¬ 
sen  based  on  the  so-called  “slowest-descent  search” ,  starting 
from  the  estimate  closest  to  the  unconstrained  minimizer  of 
the  cost  function,  and  along  mutually  orthogonal  directions 
where  this  cost  function  grows  the  slowest.  The  extension  of 
the  proposed  technique  to  multi-user  detection  in  unknown 
multi-path  fading  channels  is  also  proposed.  Simulation 
results  show  that  this  new  technique  offers  substantial  per¬ 
formance  improvement  over  the  recently  proposed  robust 
multiuser  detectors,  with  little  attendant  increase  in  com¬ 
putational  complexity. 

1.  INTRODUCTION 

Recently,  a  robust  multiuser  detection  technique  is  de¬ 
veloped  in  [4]  for  demodulating  multiuser  signals  in  the 
presence  of  both  multiple-access  interference  and  impulsive 
ambient  channel  noise.  This  technique  is  based  on  the  M- 
estimation  method  for  robust  regression,  and  is  essentially 
the  robustized  version  of  the  linear  decorrelating  multiuser 
detector.  Although  this  robust  multiuser  detector  offers 
significant  performance  gain  over  the  linear  decorrelator  in 
impulsive  noise,  there  is  still  a  large  gap  between  its  perfor¬ 
mance  and  that  of  the  maximum  likelihood  (ML)  multiuser 
detector.  However,  the  computational  complexity  of  the 
ML  detection  is  quite  high,  and  moreover,  the  ML  detection 
requires  the  knowledge  of  the  exact  probability  distribution 
of  the  noise,  which  may  not  be  available  to  the  receiver. 
Hence,  it  is  of  interest  to  develop  robust,  low-complexity, 
and  near-optimal  multiuser  detection  techniques  for  non- 
Gaussian  noise  channels.  Furthermore,  it  is  of  high  im¬ 
portance  to  have  the  ability  of  successfully  extending  this 
method  to  more  general  asynchronous  unknown  multi-path 
fading  channels.  Described  issues  are  subjects  of  this  paper. 


P.  Spasojevic  was  supported  in  part  by  the  WINLAB /Lucent 
Technologies  Wireless  Post-Doctoral  Fellowship.  X.  Wang  was 
supported  in  part  by  the  NSF  grant  CAREER  CCR-9875314. 


Xiaodong  Wang 

Department  of  Electrical  Engineering, 
Texas  A&M  University, 

College  Station,  TX  77843. 


2.  SYNCHRONOUS  SYSTEM  MODEL 

First  consider  the  following  discrete-time  synchronous 
CDMA  signal  model.  At  any  time  instant,  the  received  sig¬ 
nal  is  the  superposition  of  A- user  signals,  plus  the  ambient 
noise,  given  by 

K 

r  =  ^Qjfc6fc.sfc  +  n  =  SAb  +  n,  (1) 

fc=i 

where  Sk  =  .s^/v]7  is  the  normalized  signa¬ 

ture  sequence  of  the  fc-th  user;  N  is  the  processing  gain; 
bk  €  {+1,  —1}  and  an,  are  respectively  the  data  bit  and  the 

complex  amplitude  of  the  fc-th  user;  S  =  [«i  •  •  •  sk);  A  = 

diag(ai,  •  •  •  ,<**:);  b  =  [hi  •  •  ■  6jc]t;  and  to  =  [m  ■  •  ■  tin]1' 
is  a  vector  of  independent  and  identically  distributed  (i.i.d.) 
ambient  noise  samples  with  independent  real  and  imaginary 
components.  Denote 


'  3R{r}  ' 

\J>  — 

'  S?R{A}  ' 

A 

'  3?{n}  ' 

3{r} 

5  ^  — 

SZ{A} 

j  v  — 

3{n} 

where  v  is  a  real  noise  vector  consisting  of  2 N  i.i.d.  samples. 
Then  (1)  can  be  written  as 

y  =  iFfe-fu.  (2) 

It  is  assumed  that  each  element  Vj  of  v  follows  a  two-term 
Gaussian  mixture  distribution,  i.e., 

Vj  ~  (1  —  e)M  (0,  p2)  +  eN  (0,  /tp2)  ,  (3) 

with  0  <  e  <  1  and  k  >  1.  Here  the  term  A/*(0,  v2)  rep¬ 
resents  the  nominal  ambient  noise,  and  the  term  X(0,  kv'2) 
represents  an  impulsive  component.  The  probability  that 
impulses  occur  is  e.  Note  that  the  overall  variance  of  the 
noise  sample  Vj  is 

y  =  (1  —  e)p2  +  €kv2  .  (4) 

We  have  Cov{r}  =  2j-l2N]  and  Cov{n}  =  a2I n-  The 
model  (3)  serves  as  an  approximation  to  the  more  funda¬ 
mental  Middleton  Class  A  noise  model  [2,  5],  and  has  been 
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used  extensively  to  model  physical  noise  arising  in  radio 
and  acoustic  channels.  Recently,  it  has  been  shown  that 
another  class  of  non-Gaussian  distributions,  the  a-stable 
distributions,  can  be  well  approximated  by  a  finite  mixture 
of  Gaussians  [1],  In  what,  follows,  we  consider  the  problem 
of  detecting  the  transmitted  symbols  b  of  all  users  based  on 
the  signal  model  (2). 

3.  EXHAUSTIVE-SEARCH  DETECTION  AND 
DECORRELATIVE  DETECTION 

In  this  section,  we  give  a  unified  description  of  a  num¬ 
ber  of  approaches  to  the  problem  of  multiuser  detection 
in  non-Gaussian  noise.  There  are  primarily  two  categories 
of  such  detectors  for  estimating  b  from  y  in  (2),  all  based 
on  minimizing  the  sum  of  a  certain  function  p  of  the  chip 
residuals 

2  N 

C(b;  y)  =  ]Tp(pj-C?b),  (5) 

j- 1 

where  is  the  j-th  row  of  the  matrix  ’F. 

•  Exhaustive-search  detector: 

be  =  arg  min  C(b;y).  (6) 

het+i.-i}* 

•  Decorrelative  detector: 

/3  =  arg  min  C(b,y),  (7) 

6eRK 

6*  =  sign(/3).  (8) 

It  is  seen  that  the  exhaustive-search  detection  is  based  on 
the  discrete  minimization  of  the  cost  function  C(b;y),  over 
2k  candidate  points;  whereas  the  decorrelative  detection 
is  based  on  the  continuous  minimization  of  the  same  cost 
function.  In  general,  the  optimization  problem  (7)  can  be 
solved  iteratively  according  to  the  following  steps  [4] 

zl  =  *{v-Vri. 3')>  (9) 

f3l+1  =  (3l  +  /  =  0, 1,  —  ,  (10) 

We  consider  the  following  three  choices  of  the  penalty 
function  p(-)  in  (5),  corresponding  to  different  forms  of  de¬ 
tectors: 

•  Log-likelihood  penalty  function: 

Pml(x)  =  —  log /(a:),  V’ML(ar)  = -y^,(ll) 

where  /(■)  denotes  the  probability  density  function 
(pdf)  of  the  noise  sample.  In  this  case,  the  exhaustive- 
search  detector  (6)  corresponds  the  ML  detector;  and 
the  decorrelative  detector  (8)  corresponds  to  the  ML 
decorrelator  [4], 

•  Least-square  penalty  function: 

phs{x)=^x2,  rphs{x)  —  x.  (12) 

In  this  case,  the  exhaustive-search  detector  (6)  cor¬ 
responds  to  the  ML  detector  based  on  a  Gaussian 
noise  assumption;  and  the  decorrelative  detector  (8) 
corresponds  to  the  linear  decorrelator. 


•  Huber  penalty  function: 


Pn(x)  =  j 

r  f?> 

l 

X\x\<s£, 

,  if  1*1  >^, 

(13) 

rpH{x)  =  j 

f 

[  c  sign(a:) 

if  |*|<s£, 
if  1*1  >s£. 

(14) 

2 

where  is  the 

noise  variance  given  by  (4), 

and 

c  =  JL  is  a  constant.  In  this  case,  the  exhaustive- 
search  detector  (6)  corresponds  to  the  discrete  min- 
imizer  of  the  Huber  cost  function;  and  the  decorrel¬ 
ative  detector  (8)  corresponds  to  the  robust  decorre¬ 
lator  proposed  in  [4], 

4.  SLOWEST-DESCENT-SEARCH  DETECTION 

Clearly  the  optimal  performance  is  achieved  by  the  ex¬ 
haustive  search  detector  with  the  log-likelihood  penalty  func¬ 
tion,  i.e.,  the  ML  detector.  As  will  be  seen  in  Section  5,  the 
performance  of  the  exhaustive  search  detector  with  the  Hu¬ 
ber  penalty  function  is  close  to  that  of  the  ML  detector, 
while  this  detector  does  not  require  the  knowledge  of  the 
exact  noise  pdf.  However  computational  complexity  of  the 
exhausive  search  detector  (6)  is  on  the  order  of  0( 2K).  We 
next  propose  a  local  search  approach  to  approximating  the 
solution  to  (6).  The  basic  idea  is  to  minimize  the  cost  func¬ 
tion  C(b;y)  over  a  subset  O  of  the  discrete  parameter  set 
{— 1,+1}X  that  is  close  to  the  continuous  stationary  point 
f3  given  by  (7).  More  precisely,  we  approximate  the  solution 
to  (6)  by 

b 3  =  arg  min  C(b-,y).  (15) 

6efl 

In  the  slowest  descent  method  [3],  the  candidate  set  S~I  con¬ 
sists  of  the  discrete  parameters  chosen  such  that  they  are 
in  the  neighborhood  of  Q  (Q  <  K)  lines  in  1RK,  which  are 
defined  by  the  stationary  point  f3  and  the  Q  eigenvectors  of 
the  Hessian  matrix  V2(f3)  of  C{b\y)  at  (3  corresponding  to 
the  Q  smallest  eigenvalues.  The  basic  idea  of  this  method 
is  explained  next. 

Slowest-Descent  Search:  The  basic  idea  of  the  slowest- 
descent  search  method  is  to  choose  the  candidate  points  in 
fl  such  that  they  are  closest  to  a  line  (/3  +  pg)  in  1RK, 
originating  from  (3  and  along  a  direction  g,  where  the  cost 
function  C{b\  y)  increases  at  the  slowest  rate.  Given  any  line 
in  1Rk,  there  are  at  most  K  points  where  the  line  intersects 
the  coordinate  hyper-planes  (e.g.,  /31  and  (32  in  Figure  1  for 
K  =  2).  The  set  of  intersection  points  corresponding  to  a 
line  defined  by  /3  and  g  can  be  expressed  as 

{/3l  =  (3  -  mg  ■.  m  =Pi/9i}*=1,  (16) 

where  Pi  and  </,:  denote  the  z-th  elements  of  the  respective 
vectors  (3  and  g.  Each  intersection  point  (3r  has  only  its 
i-th  component  equal  to  zero,  i.e.,  (3\  =  0. 

Any  point  on  the  line  except  for  an  intersection  point 
has  an  unique  closest  candidate  point  in  {+1,  —  1}A.  An 
intersection  point  is  of  equal  distance  from  its  two  neigh¬ 
boring  candidate  points,  e.g.,  01  is  equi-distant.  to  b1  and  b2 
in  Figure  1(a).  Two  neighboring  intersection  points  share 


147 


For  the  three  types  of  the  penalty  functions,  the  Hessian 
matrix  at  the  stationary  points  are  given  respectively  by 


(a)  (b) 

Figure  1:  One-to-one  mapping  from  {/3,  /31 ,  ■  ■  •  ,/3K}  to 
fi  =  {6*,  b1,  •  •  • ,  bK }  for  K  =  2.  Each  intersection  point 
f3l  is  of  equal  distance  from  its  two  neighboring  candidate 
points.  6 *  is  chosen  to  be  one  of  these  two  candidate  points 
that  is  on  the  opposite  side  of  the  j-th  coordinate  hyper¬ 
plane  with  respect  to  b* . 


a  unique  closest  candidate  point,  e.g.,  /31  and  01  share  the 
nearest  candidate  point  b 2  in  Figure  1(a).  Note  that  b* 
in  (8)  is  the  candidate  point  closest  to  (3.  By  carefully 
selecting  one  of  the  two  candidate  points  closest  to  each 
intersection  point  to  avoid  choosing  the  same  point  twice, 
one  can  specify  K  distinct  candidate  points  in  {+1,— 1}K 
that  are  closest  to  the  line  (/3  4-  fig).  To  that  end,  consider 
the  following  set 


{ft*  €{-!,+!}* 


■■K 


{ 


sign  (01)  , 

~bt, 


k  yt  { 
k  =  i 


It  is  seen  that  (17)  assigns  to  each  intersection  point  0'  the 
closest  candidate  point  b'  that  is  on  the  opposite  side  of  the 
i-th  coordinate  hyper-plane  from  bd  [cf.Figure  1  (a)  (b)]. 

In  general,  the  slowest-descent  search  method  chooses 
the  candidate  set  ft  in  (15)  as  follows: 


Q 

n  =  {bd}u|J{fe^e{-l,+l}K: 

9=1 

ia,h  _  /  sign  (0k  ~figqk),  if  0k  -ngqk^0 
k  \  -bl,  if  0k  -  figqk  =  0  ’ 


gq  is  the  q-th  smallest  eigenvector  of  V2  , 


Hence,  {bq'q},,_  contains  the  K  closest  neighbors  of  f3  in 
{— 1,  +1}K  along  the  direction  of  gq .  Note  that  {g9}^=1 
represent  the  Q  mutually  orthogonal  directions  where  the 
cost  function  C(b;y)  grows  the  slowest  from  the  minimum 
point  /3.  (In  case  of  the  log-likelihood  penalty  function,  this 
corresponds  to  the  situation  where  the  likelihood  function 
drops  the  slowest  from  its  peak,  hence  the  name  “slowest 
descent”)  Intuitively,  the  solution  to  (6)  is  most  likely  found 
in  this  neighborhood. 


Pml  : 

Vc(0)  =  *Tdiag  \ 

^Pml  {yj 

(19) 

Pls  : 

V2e((3)  =  *TV, 

(20) 

PH  : 

V2c((3)  =  4>Tdiag  J 

}*■ 

(21) 

where  in  (19)  Pml(*)  =  V’mlO)  -  f"(x)/f(x)  and  in  (21) 
the  indicator  function  5(y  <  a)  =  1  if  y  <  a  and  0  otherwise; 
hence  in  this  case  those  rows  of  with  large  residual  signals 
as  a  possible  result  of  impulsive  noise  are  nullified,  whereas 
other  rows  of  are  not  affected. 

Finally  we  summarize  the  slowest-descent  search  algo¬ 
rithm  for  multiuser  detection  in  non-Gaussian  noise.  Given 
a  penalty  function  p(-),  this  algorithm  solves  the  discrete 
optimization  problem  (15)  according  to  the  following  steps: 

1.  Compute  the  continuous  stationary  point  (3  in  (7) 
using  the  iteration  (9)-(10); 

2.  Compute  the  Hessian  matrix  Vc(/3)  given  by  (19)  or 
(20)  or  (21),  and  its  Q  smallest  eigenvectors  g1,  -  -  ,g®', 

3.  Solve  the  discrete  optimization  problem  defined  by 
(15)  and  (18)  by  an  exhaustive  search  (over  (KQ+ 1) 
points). 

5.  EXTENSION  TO  AN  UNKNOWN 
MULTIPATH  CHANNEL 

In  this  section,  we  extend  the  slowest  descent  multiuser 
detection  techniques  developed  above  to  the  asynchronous 
CDMA  system  with  multipath  distortion.  Following  [4],  [7], 
and  references  therein,  r[i],  the  vector  consisting  of  a  num¬ 
ber  of  stacked  one-symbol  length  vectors  that  affect  the 
current  symbol  interval  i  can  be  expressed  as  follows: 

r[i ]  =  Hb[i]  +  n[i].  (22) 

Here,  6[i]  and  n[i]  are  stacked  symbol  vectors,  and  H  is  the 
unknown  channel  matrix. 

We  can  rewrite  (22)  as 

r[i]  =  H0[i]  +  n[i]  =  I/s<)[*]  +  »[*]. 

Here  orthonormal  column  vectors  of  Us  span  the  column 
space  of  H  and  can  be  obtained  using  an  eigen-decomposition 
of  the  received  signal  autocorrelation  matrix  (see  [6]).  The 
estimation  of  the  channel  matrix  H  is  based  on  the  users’ 
signature  sequences  and  the  noise  subspace  estimated  from 
the  auto-correlation  eigen-decomposition  (see  [7]). 

We  next  obtain  the  robust  estimate  of  £[?']  based  on  the 
complex  version  of  the  decorrelative  iterations  (9)-(10)  for 
an  (e.g.,  Huber)  objective  function 

zl  =  i>(r-Us  C!),  (23) 

C,+1  =  C l+U?zl,  1  =  0,1,2,-  (24) 
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where  H  is  the  Hermitian  operator.  0[i]  can  be  estimated 
as  follows 

0[i]  =  ( HHH)-1HflUsC[i\ • 

Note  that 

HQ[i\  =  H0Ab[i]  +  H#0#[i\,  (25) 

where  the  term  HoAb[i ]  contains  the  signal  carrying  the 
current  bits  &[«];  and  the  term  contains  the  signal 

carrying  the  previous  and  future  bits  {b[l]}ijn,  i.e.,  inter¬ 
symbol  interference.  A  holds  unknown  phases  which  are 
estimated  separately  from  the  channel  as  demonstrated  be¬ 
low.  We  subtract  the  estimated  intersymbol  interference 
from  r[i\  to  obtain 

f[i]  =  t[1]  —  H#0#[i]  (26) 

=  HoAb[i\  +  n[i\, .  (27) 


We  can  now  set 

H0$l{A}  ' 

H0%{A}  J  ’ 

and  use  the  methods  described  in  previous  sections  to  derive 
decorrelative  and  the  slowest-descent  estimates  of  6[i]  based 
on  r[i}. 


the  slowest  descent  detector  with  2  search  directions,  and 
the  exhaustive  detector.  Searching  further  slowest  descent 
directions  does  not  improve  the  performance  in  this  case. 
We  observe  that  for  all  three  criteria  the  performance  of  the 
slowest  descent  detector  is  close  to  the  performance  of  its 
respective  exhaustive  maximization  version.  All  detectors 
are  significantly  better  then  the  LS  based  detectors. 

For  the  multi-path  channel  case  the  following  is  as¬ 
sumed:  processing  gain  N  =  15,  number  of  users  K  =  6 
each  user’s  channel  has  3  paths  and  a  delay  spread  of  up  to 
one  symbol  interval.  The  complex  gains,  the  delays  of  each 
user’s  channel,  and  user  signature  sequences  are  generated 
randomly.  The  chip  pulse  is  a  raised  cosine  pulse  with  roll¬ 
off  factor  0.5.  The  path  gains  are  normalized  so  that  each 
user’s  signal  arrives  at  the  receiver  with  unit  energy.  The 
over-sampling  factor  is  2  and  the  number  of  stacked  vectors 
in  (22)  (the  smoothing  factor)  is  2. 

Figure  3  demonstrates  the  performance  of  the  Huber- 
based  slowest-descent  method  with  one  and  two  search  di¬ 
rections,  the  decorrelative  Huber  detector,  and  the  blind 
decorrelator  from  [6].  Most  of  the  performance  gain  of¬ 
fered  by  the  slowest-descent  method  is  obtained  by  search¬ 
ing  along  only  one  direction.  Over  1  dB  of  gain  is  obtained 
relative  to  the  the  decorrelative  estimate.  The  blind  ap¬ 
proach  [6]  performs  poorly  for  this  system. 


Estimation  of  A 

We  next  consider  the  estimation  of  the  complex  amplitudes 
A.  Following  (25),  we  have  [recall  that  A  =  diag(ai,  •  •  • ,  Q^).] 

Ok  =  Qfchk+fifc,  k  —  l,  ■■■,&.  (28) 

Since  bk  €  {— 1,4-1},  it  follows  from  (28)  that  6k  form  two 
clusters  centered  at  respectively  ak  and  —  a*,.  Let  ak  = 
Pke3‘t’k,  a  simple  estimator  of  a*,  is  given  by  &k  =  Pke3<l’k 
with 

Pk  =  E{\6k\ }, 

4>k  = 

j  E{L  [feign  (!R{0fc})]},  ff£{|»{0fc}|}>E{|3{fMI} 

\  £{Z[0fcsign(9{0fc})]},  fffnm0fc}|}<£{|3{<M|}  " 

where  the  operator  E(-)  denotes  sample  average.  Note  that 
the  above  estimate  of  the  phase  <j>k  has  an  ambiguity  of 
7r,  which  necessitates  differential  encoding  and  decoding  of 
data. 

6.  SIMULATION  RESULTS 


Figure  2:  Symbol  error  performance  of  a  synchronous  DS- 
CDMA  system  with  N  =  15,  K  =  8,  e  =  0.01,  k  =  100. 


7.  CONCLUSION 


For  simulations,  we  assume  a  synchronous  CDMA  sys¬ 
tem  with  a  processing  gain  N  =  15,  number  of  users  K  —  6, 
no  phase  offset  and  equal  amplitudes  of  user  signals,  i.e., 
ak  =  1,  k  =  1,  •  •  ■ ,  A.  User  1  signature  Si  sequence  is 
generated  randomly  and  kept  fixed  throughout  simulations. 
Signature  sequences  of  Users  2  through  K  are  generated  by 
a  circularly  shifting  the  sequence  of  User  1. 

For  each  of  the  three  penalty  functions  Figure  2  presents 
the  symbol  error  performance  of  the  decorrelative  detector, 


We  have  developed  a  new  robust  multiuser  detection 
technique  based  on  the  method  of  slowest-descent  search. 
By  searching  only  over  one  or  two  directions,  this  method 
offers  significant  performance  improvement  over  the  recently 
proposed  robust  decorrelating  detector  in  impulsive  noise. 
The  proposed  approach  has  been  extended  to  multi-path 
fading  channels  were  complex  channels  and  signal  phases  of 
all  users  have  to  be  estimated  blindly. 
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Figure  3:  Symbol  error  performance  of  an  asynchronous  DS- 
CDMA  system  with  N  =  15,  K  =  8,  e  =  0.01,  k  =  100,  in 
an  unknown  multi-path  channel  with  3  randomly  generated 
path  coefficients  per  user. 
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ABSTRACT 

This  paper  presents  a  new  recursive  algorithm 
for  maximum  likelihood  estimation  of  the  delay- 
Doppler  characteristics  of  fast-fading  mobile  com¬ 
munication  channels.  The  channel  is  modelled 
as  an  FIR  filter  with  rapidly  varying  complex 
coefficients.  The  parameters  of  interest  are  the 
mean  channel  taps  and  the  tap  covariance.  The 
structure  of  the  channel  tap  covariance  matrix  is 
exploited  to  provide  convergence  to  constrained 
channel  estimates. 

1.  INTRODUCTION 

Maximum  likelihood  constrained  covariance  estimation 
for  directly  observable  processes  in  additive  noise  has 
received  considerable  attention  [1,  2,  3,  4]  since  many 
algorithms  in  spectral  analysis  rely  on  knowledge  of 
the  covariance  matrix.  Applications  include  harmonic 
retrieval,  beamforming  and  direction  of  arrival  estima¬ 
tion.  In  many  such  cases,  the  system  of  interest  is 
shift-invariant  and  the  true  covariance  matrix  is  known 
to  be  Hermitian  Toeplitz  as  well  as  positive  semidefi- 
nite.  This  structure  may  be  used  in  obtaining  realistic 
covariance  matrix  estimates,  and  in  addition  may  be 
exploited  in  to  provide  fast  convergence  to  constrained 
estimates  and  aid  subsequent  processing  (e.g.  inverses, 
eigendecomposition  etc.). 

In  this  paper,  we  consider  the  extension  of  constrained 
covariance  estimation  to  the  case  where  the  process  of 
interest  is  observed  through  convolution  with  a  known 
signal  in  addition  to  the  additive  noise.  This  problem 
arises  in  delay-Doppler  radar  imaging  [5]  and  delay- 
Doppler  imaging  of  fast-fading  mobile  communication 
channels  [6].  In  these  situations,  the  underlying  re¬ 
flectance  process  has  a  time-varying  impulse  response, 
and  therefore  is  two-dimensional  (in  time,  k,  and 
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delay,  e).  The  delay-Doppler  image  of  a  reflectance 
process  is  also  known  as  the  scattering  function,  and  is 
related  to  the  covariance  matrix  by  a  Fourier  transform 
(in  the  time  axis  indexed  by  k)  [7]. 

This  paper  presents  a  new  algorithm  for  maximum  like¬ 
lihood  estimation  of  the  covariance  matrix  (and  there¬ 
fore  the  delay-Doppler  characteristics)  of  fast-fading 
mobile  communication  channels.  Importantly,  our  al¬ 
gorithm  explicitly  makes  use  of  the  structural  constraints, 
Key  features  of  the  algorithm  include  joint  estimation 
of  the  channel  mean  and  covariance,  and  applicabil¬ 
ity  to  a  general  class  of  wide-sense  stationary  (WSS) 
channels. 


2.  CHANNEL  MODEL 
Channel  Response 

Consider  a  discrete  equivalent  baseband  model  in  which 
the  complex- valued  time- varying  channel,  or  reflectance 
process,  fk,(,  represents  the  effect  at  time  k,  for  reflec¬ 
tions  with  a  path  delay  e.  Ignoring  the  average  delay 
in  the  analysis,  the  observed  signal  is 

L  —  l 

zk  =  ^2  (*) 

e=0 

where  L  is  the  length  of  the  finite  impulse  response 
(FIR)  channel,  or  the  extent  of  the  radar  target,  Xk  is 
the  known  transmitted  signal,  and  Wk  is  the  additive 
noise  introduced  at  the  receiver. 

Writing  the  observations  for  k  =  0, . . . ,  N  -  1  in  vector 
notation, 

z  =  XF  +  w  (2) 
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where  the  matrix  of  channel  inputs  is 


trix,  R  £  Tjv.l,  may  be  written  as 


x0 


X  = 


0 


0 


xn- i 


X-L+l  •  ••  0 

0  •  •  •  a;  A r-L 


and  F  =  [/0)o,  •• -/at-1,0,  ,/w-i,t-i]T  1  ■  The 

time-varying  channel  (or  reflectance)  process,  F  is  seen 
to  be  two-dimensional  in  that  its  elements  are  charac¬ 
terized  both  by  the  time  index  k,  and  the  delay  index  e. 


When  a  line-of-sight  path  or  specular  (stable)  reflec¬ 
tions  exist  between  the  transmitter  and  receiver,  the 
channel  is  no  longer  zero-mean.  Thus  F  =  F+F,  where 
F  is  the  zero-mean  time- varying  component  and  F  is 
the  mean  component,  constant  over  the  observation  in¬ 
terval,  TV.  Here  F  is  a  NL  x  1,  but  contains  only  L  in¬ 
dependent  parameters.  For  convenience,  we  also  define 
the  L  x  1  vector  G  =  (I<8>eT)F,  and  the  corresponding 
TV  x  L  matrix  of  channel  inputs  Y  =  X(I  ®  1),  where 
eT  =  [1, 0, . . . ,  0]  is  an  1  x  TV  unit  vector,  1  =  [1, . . . ,  1]T 
is  an  AT  x  1  vector  of  ones,  (g>  is  the  Kronecker  product 
operator,  and  I  is  the  L  x  L  identity  matrix. 


M 

R  =  ^  rmQm  (4) 

m= 1 

where  rm  are  the  values  of  the  real  and  imaginary  com¬ 
ponents  of  elements  of  R.  There  are  M  =  2 NL2  —  l? 
independent  parameters,  rm.  The  channel  covariance 
matrix  is  (by  definition)  positive  semidefinite.  This 
manifests  itself  as  a  highly  nonlinear  constraint  on  the 
parameters,  rm. 

Assuming  additive  white  Gaussian  noise  (AWGN)  at 
the  receiver,  the  channel  covariance,  R  is  related  to  the 
TV  x  TV  observation  covariance  matrix,  Rz  =  E  [  zzH  ] 
by 

Rz  =  XRXh  +  a2wI  (5) 

where  a2w  is  the  variance  of  the  observation  noise,  and 
the  observation  is  z  =  z  +  z,  where  z  =  XF  =  YG  is 
the  mean  response. 

3.  MAXIMUM  LIKELIHOOD  CHANNEL 
ESTIMATION 


Channel  Covariance 

The  dimensionality  of  the  channel  impulse  response  is 
reflected  in  the  structure  of  the  covariance  matrix 


R 


E[FFH } 

Ro,o  •••  Ro,L-1 

Rl-i,o  •  •  •  Rl-i,l-i 


(3) 


which  consists  of  L  x  L  blocks  of  A  x  TV  matrices,  RClj£2 
which  represent  the  covariance  between  taps  (or  reflec¬ 
tors)  at  delays  ei  and  e2.  For  the  radar  target,  where 
scatterers  are  assumed  to  behave  independently  (i.e. 
uncorrelated  scatterers  (US))  [5],  the  off-diagonal  ma¬ 
trices  will  all  be  zero.  However,  for  the  communication 
channel  model,  the  inclusion  of  the  transmitter  and  re¬ 
ceiver  pulse  shapes  in  the  equivalent  channel  response, 
fk,€,  means  that  this  is  not  the  case. 


To  adequately  identify  the  channel,  we  require  esti¬ 
mates  for  the  vector  of  channel  tap  means,  G,  and  the 
matrix  of  channel  tap  covariances,  R.  It  is  important 
that  the  estimates  maximize  the  likelihood  over  the  set 
of  admissible  structured  matrices  R  €  Tjv.i,. 

It  is  easily  shown  that  maximizing  the  likelihood  func¬ 
tion  for  the  channel  model  of  Section  2  is  the  same  as 
maximizing  the  following  expression 

$(G,  R)  =  —  lndet  Rz  -  tr  {RZ_1S}  (6) 

where  the  sample  covariance  matrix,  S  =  (z  —  YG)(z  — 
YG)H  is  a  function  of  the  mean  channel  G,  and  Rz  is 
a  function  of  the  channel  covariance  R  as  given  above 
in  (5).  Here,  tr{-}  denotes  the  trace  operator. 

Note  that  the  likelihood,  and  hence  4>(G,R),  is  only 
defined  when  Rz  is  strictly  positive  definite. 

Lemma  1  When  G  is  given  by 


When  the  statistics  of  the  fading  or  reflectance  pro¬ 
cess  are  wide-sense  stationary  (WSS)  (in  the  dimen¬ 
sion  indexed  by  k),  the  covariance  matrices  Rfl)(2  are 
Toeplitz.  The  overall  matrix  is  then  Hermitian  sym¬ 
metric  and  block- Toeplitz.  The  set  of  Hermitian  block- 
Toeplitz  matrices  is  denoted  here  by  TV,l- 

The  Hermitian  block- Toeplitz  channel  covariance  ma- 

lrI  hc  transpose  operator  is  denoted  (-)T,  and  (■)H  denotes  a 
Hermitian  transpose. 


G  =  (YHRz“1Y)-1YHRz~1z  (7) 

the  first  differential  of  the  likelihood  objective  (6)  is 

d$  =  tr  {X"RZ-J (S  -  Rz)Rz~1XdR}  (8) 


Proof  The  first  differential  of  the  objective  function 
(6)  is  [8] 

d$  =  — d(lndetRz)  —  tr  {d(Rz-1)S} 
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Now,  the  first  term  is  given  by 

d(lndetRz)  =  tr{Rz-1dRz} 
and  the  differential  of  an  inverse  is  given  by  [8,  pg  151] 
d(Rz_1)  =  -Rz_1dRzRz_1 


In  order  to  prove  that  the  recursion  (10)  increases  the 
likelihood  (i.e.  when  dR,  =  »;(R,'  —  Rj_i)),  we  now 
proceed  to  show  that  the  second  term  in  (14)  is  positive, 
and  the  first  term  is  zero. 

The  second  term  in  (14)  may  be  written 


Thus 

d$  =  — tr  {Rz_1dRz}  +  tr  {Rz-1dRz  RZ-1S} 
+2  tr  {  Rz  -1  YdG(z  -  YG) H  } 

=  tr  {RZ_1(S  —  Rz)Rz_1dRz} 

+2tr{(z-YG)HRz_1YdG}  (9) 

Substituting  (7)  into  (9)  gives  (8).  ■ 

Unfortunately,  due  to  non-linearities  in  (8)  and  the 
need  for  a  positive  definite  solution,  it  is  infeasible  to 
obtain  an  analytic  maximum  likelihood  solution  for  the 
covariance  matrix  using  (8)  by  setting  d$  =  0.  We  now 
present  our  main  result  in  the  following  theorem  which 
leads  us  to  our  recursive  algorithm  in  Section  4  for  find¬ 
ing  an  admissible  maximum  likelihood  solution. 

Theorem  1  The  sequence  of  covariance  matrices, 
{R*},  and  channel  tap  means,  {G*},  generated  by  the 
following  iterative  equations  (10)  -  (12),  monotonically 
increases  in  likelihood 

R;  =  Rj-i  +  o;i(Ri  —  Rj-i)  (10) 

Rz,i  =  XR  +  (11) 

Gj  =  (Yif(RZii)_1Y)_1YK(RZii)-1z  (12) 

where  oti  >  0  is  an  arbitrarily  small  stepsize,  and  where 
M 

R*  =  ^  rn,iQn,  f°r  rn,i  satisfying  the  following  set  of 

n=  1 

equations,  for  m  =  1 
M 

£  tr  {XHRz-j_1XQnX"Rz-j_1XQm  }  f„(i 

n—  1 

=  tr  {XHRZ  ]_1(Si_1  -  ^i)R-j_lXQm}  (13) 

where  Rz,j_i  =  XRj-iX^  +  cr^I  and  S,_i  =  (z  — 
YGj_i)(z  —  YGj_i)H.  The  initial  Rq  must  be  posi¬ 
tive  definite  Hermitian  block-  Toeplitz  (e.g.  Ro  =  I). 


Proof  From  (8),  consider  the  differential  of  the  likeli¬ 
hood  objective  function  at  iteration  i 

d$i  =  tr  {XHRZ  ]_1(S,_i  —  Rz,,_1)R“]_1XdR,} 

—  tr  {X^RZ  (Sj_i  —  Rz,j-i 

-(1  /«<)  XdRj  XH)  R-^iXdRi} 

+(lM)tr  {XHRZ  XdRjX^ R“ XdRi }  (14) 


( 1  /a«)tr  {  XH  R~ i  XdRjXw  R~  XdR* } 

=  (1/Qi)tr  {XHAAHXdRiXHAAHXdRi}  ; 

Rz  ]_j  =  AAh  since  p.d. 

=  (l/aj)tr  {A^XdRjX^AA^XdRjX^A} 

=  (l/aOtrjBB"}; 

B  =  AHXdRiXHA  and  dR*  =  dRf 

>  0 

Now,  before  considering  the  first  term  in  (14),  consider 
(13),  which  can  be  written 

tr  {XHRZ  j_1(Si_1  -  a2wl)R;}_rXClm} 

-tr  |xffRz  J_1XRjXffR“]_1XQm|  =  0 
tr  {XhRz  ]_1(S,_i  -  a2wl  -  XR,-i  XH 

-(1  /at)  XdRiXH)R2-j_1XQm}  =  0; 
since  R*  =  Rj_i  +  (l/Qj)dRj 
tr  {XJ?Rz  j_1(Si_1  —  Rz,i-i 

-(1  /at)  XdRj XH ) R~ j XQm }  =  0 

M 

m— 1 

-(1  /«<)  XdRiXH)Rzj_1XQm}  drm  =  0 

tr|xffRz]_1(Sj_i  —  RZij_i 

M 

-(l/ai)XdRiX")R-]_1X  J2  Q mdrm 


M 

Since  dR,  =  ^  Qmdrm,  the  first  term  in  (14)  is  zero. 

m~  1 

■ 

Remark  1  Theorem  1  utilizes  the  inverse  iteration 
argument  of  [1  ].  However,  this  new  result  is  applica¬ 
ble  when  the  process  of  interest  is  not  necessarily  zero- 
mean  and  is  observed  via  convolution  in  additive  noise. 
We  have  included  estimation  of  both  the  real  and  imag¬ 
inary  parts  of  each  of  these  channel  parameters  which 
arise  from  the  baseband  model.  Since  the  result  is  not 
restricted  to  zero-mean  and  uncorrelated  scatterer  mod¬ 
els,  it  leads  to  a  more  generally  applicable  algorithm 
than  the  circulant  extension  algorithm  of  [5]. 
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4.  RECURSIVE  ALGORITHM 


Theorem  1  provides  us  with  a  recursive  algorithm  for 
maximum  likelihood  channel  mean  and  structured  co- 
variance  estimates.  However,  in  order  to  perform  an 
iteration  (10)— (12) ,  we  must  first  solve  (13).  At  each 
iteration  i,  this  can  be  done  by  forming  a  vector  x  = 
[riti, . . .  ,rM,i]T  and  an  M  x  1  vector  b  with  elements 
given  by  the  RHS  of  (13)  for  m  =  1, . . . ,  M.  Now  the 
set  of  equations  in  (13)  for  m  —  1, . . . ,  M  can  be  writ¬ 
ten  in  the  form  Ax  =  b,  where  we  can  now  solve  for 
x.  It  is  easily  be  shown  that  at  each  iteration  A  is 
positive  definite,  and  therefore  efficient  algorithms  can 
be  employed  in  the  solution. 

This  new  recursive  estimation  algorithm  is  in  fact  a  lin¬ 
earized  gradient  algorithm,  as  can  be  see  by  the  linear 
equation  (13).  The  formulation  can  easily  be  extended 
for  multiple  observations. 

Remark  2  For  the  directly  observable  case  presented 
in  [1],  it  was  sufficient  to  confine  the  estimates  of  the 
structured  covariance  matrix  at  each  iteration  to  the 
positive  definite  region  (by  appropriate  choice  of  the 
stepsize)  to  obtain  an  admissible  maximum  likelihood 
solution.  Note  that  for  the  case  presented  here  however, 
that  Rz  (5)  may  be  positive  definite  even  when  the  es¬ 
timate  of  H  is  not.  Since  the  maximum  of  the  objective 
( 6)  may  occur  in  this  region,  gradient  algorithms  may 
not  be  guaranteed  to  find  an  admissible  (R  6  T/v,/J 
maximum  of  the  objective  function. 

Example  A 

Our  new  recursive  algorithm  was  first  tested  on  a  zero- 
mean  US  channel.  The  channel  was  simulated  with 
L  =  2  independent  equal  power  fading  taps  with  a 
Jakes’  Doppler  spectrum.  The  signal  to  noise  ratio 
(SNR)  was  nominally  chosen  to  be  10  dB.  The  dimen¬ 
sion  of  the  covariance  matrix  was  NL  —  50  and  75 
samples  of  the  channel  output  were  used  in  the  esti¬ 
mation  (representing  a  multiple  observation  factor  of 
3).  The  stepsize  at  each  iteration,  a,,  was  chosen  to 
confine  the  corresponding  estimate  R,  to  the  positive 
definite  region. 

Figure  1  shows  the  progression  of  the  objective  max¬ 
imization  with  respect  to  computational  effort.  Also 
shown  is  the  progression  of  the  algorithm  of  [5],  using 
a  factor  of  2  for  the  circulant  extension.  The  scaling 
of  the  curves  relative  to  the  computational  effort  was 
based  on  counts  of  floating  point  operations  in  Matlab 
for  unoptimized  code  in  both  cases,  and  therefore  the 
figure  is  only  indicative  of  a  performance  comparison. 

Importantly,  Figure  1  shows  that  the  restriction  of  the 


Figure  1:  Maximization  of  the  likelihood  objective  rel¬ 
ative  to  computational  effort,  estimates  R,  restricted 
to  the  positive  definite  region 

estimates  R*  to  the  positive  definite  region  at  each  iter¬ 
ation  may  result  in  trapping  the  algorithm  at  the  pos¬ 
itive  definite  boundary  when  a  solution  with  greater 
likelihood  exists  in  the  admissible  region. 

Remark  3  The  algorithm  in  [5]  for  US  channels  ex¬ 
ploits  the  circulant  extension  property  of  Toeplitz  ma¬ 
trices  [9],  and  has  been  shown  to  be  an  instance  of  the 
expectation-maximization  (EM)  algorithm.  With  sen¬ 
sible  initialization,  this  algorithm  maintains  a  positive 
definite  estimate  of  R.  However,  due  to  the  augmen¬ 
tation  of  the  Toeplitz  matrix  to  a  circulant  matrix,  the 
estimation  problem  is  modified,  and  conditions  for  con¬ 
vergence  to  an  admissible  maximum  of  (6)  have  not  yet 
been  established. 

Modification  of  the  gradient 

To  pursue  the  admissible  maximum  likelihood  solu¬ 
tion,  modification  of  the  gradient  is  required  to  allow 
movement  tangential  to  the  positive  definite  boundary 
whilst  maintaining  the  positive  definite  constraint  on 
the  estimates  R,  .  Due  to  the  complexity  of  the  rela¬ 
tionship  between  the  positive  definite  constraint  and 
the  parameters  rm,  no  obvious  modification  strategy  is 
apparent. 

A  simple  modification  we  have  found  is  to  replace  the 
set  of  linear  equations  for  calculating  Rj  (13)  with 

M 

^2  tr  {Rr^QnRr1!  Qm}  fnJ 
11=1 

=  tr  {X^R” Sj-iR" XQm  }  (15) 

It  is  an  unproven  conjecture  of  this  paper  that  this 
modified  algorithm  converges  to  an  admissible  maxi¬ 
mum  likelihood  solution  for  the  structured  covariance 
matrix. 
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Figure  2:  Maximization  of  the  likelihood  objective  rel¬ 
ative  to  computational  effort,  modified  gradient 


Figure  3:  Delay-Doppler  profile  of  the  simulated  chan¬ 
nel 

Example  B 

The  experiment  of  Example  A  was  repeated  using  the 
modified  gradient  described  above.  Note  the  smooth 
trajectory  of  the  modified  algorithm,  suggesting  that 
the  algorithm  is  no  longer  trapped  prematurely.  Also 
shown  is  the  likelihood  (V)  obtained  for  the  structured 
covariance  matrix  estimate  without  the  positive  defi¬ 
nite  constraint. 

Figure  3  shows  the  delay-Doppler  profile  of  the  simu¬ 
lated  channel.  Figures  4  and  5  show  the  corresponding 
estimates  of  the  delay  Doppler  spectrum.  Improvement 
can  be  obtained  using  more  data  (with  correspondingly 
more  computational  effort)  and/or  higher  SNR.  Fur¬ 
ther  trials  show  that  the  modified  gradient  algorithm 
is  robust  in  estimating  the  channel  mean,  with  good 
mean  estimates  and  negligible  impact  on  the  covari¬ 
ance  estimate  and  delay-Doppler  profile. 

REFERENCES 

[1]  J.  P.  Burg,  D.  G.  Luenberger,  and  D.  L.  Wenger,  “Esti¬ 
mation  of  structured  covariance  matrices,”  in  Proceed¬ 
ings  of  the  IEEE,  vol.  70,  pp.  963-974,  Sept.  1982. 

[2]  M.  I.  Miller  and  D.  L.  Snyder,  “The  role  of  likelihood 
and  entropy  in  incomplete-data  problems:  Applications 
to  estimating  point-process  intensities  and  Toeplitz  con- 


Figure  4:  Estimated  delay-Doppler  profile,  circulant 
extension  algorithm 


Figure  5:  Estimated  delay-Doppler  profile,  modified 

gradient  algorithm 

strained  covariances,”  Proceedings  of  the  IEEE ,  vol.  75, 
pp.  892-907,  July  1988. 

[3]  A.  Dembo,  C.  L.  Mallows,  and  L.  A.  Shepp,  “Embed¬ 
ding  nonnegative  definite  Toeplitz  matrices  in  nonneg¬ 
ative  definte  circulant  matrices,  with  application  to  co- 
variance  estimation,”  IEEE  Trans,  on  Information  The¬ 
ory,  vol.  35,  pp.  1206-1212,  Nov.  1989. 

[4]  L.  M.  Davis,  R.  J.  Evans,  and  E.  Polak,  “Maximum  like¬ 
lihood  estimation  of  positive  definite  Hermitian  Toeplitz 
matrices  using  Outer  Approximations,”  in  Proc.  of 
IEEE  Workshop  on  Statistical  Signal  and  Array  Pro¬ 
cessing  (SSAP’98),  (Portland,  OR,  USA),  pp.  49-52, 
Sept.  1998. 

[5]  D.  L.  Snyder,  J.  A.  O’Sullivan,  and  M.  I.  Miller,  “The 
use  of  maximum  likelihood  estimation  for  forming  im¬ 
ages  of  diffuse  radar  targets  from  delay-Doppler  data,” 
IEEE  Trans,  on  Information  Theory,  vol.  35,  pp.  536- 
548,  Nov.  1989. 

[6]  L.  M.  Davis,  I.  B.  Collings,  and  R.  J.  Evans,  “Esti¬ 
mation  of  LEO  satellite  channels,”  in  Int.  Conf.  on 
Information,  Communications  and  Signal  Processing 
(ICICS’97),  vol.  1,  (Singapore),  pp.  15-19,  Sept.  1997. 

[7]  H.  L.  Van  Trees,  Detection  Estimation  and  Modulation 
Theory,  vol.  III.  Wiley,  1971. 

[8]  J.  R.  Magnus  and  H.  Neudecker,  Matrix  Differential 
Calculus  with  Applications  in  Statistics  and  Economet¬ 
rics.  Wiley,  1988. 

[9]  R.  M.  Gray,  “Toeplitz  and  circulant  matrices:  Ii,”  tech, 
rep.,  Center  for  Systems  Research,  Stanford  University, 
Apr.  1977. 


155 


ENHANCED  SPACE-TIME  CAPTURE  PROCESSING  FOR  RANDOM  ACCESS  CHANNELS 


Alexandr  M.  Kuzminskiy,  Kostas  Samaras,  Carlo  Luschi  and  Paul  Strauch 


Bell  Laboratories,  Lucent  Technologies 
Unit  1,  Pagoda  Park,  Westmead  Drive 
Swindon,  Wiltshire  SN5  7YT,  UK 
ak9@lucent.com 


ABSTRACT 

The  problem  of  maximizing  the  throughput  in  a  Random 
Access  Channel  (RACH)  in  a  TDMA-based  system  is  ad¬ 
dressed.  A  general  analysis  of  a  Slotted  ALOHA  system  is 
presented  which  shows  that  a  possibility  to  recover  more 
than  one  user  in  a  RACH  collision  can  significantly  im¬ 
prove  system  performance.  Three  capture  algorithms  based 
on  semi-blind  space-time  filtering  are  proposed.  Their  effi¬ 
ciency  compared  to  the  conventional  (power)  training-based 
capture  algorithm,  is  demonstrated  by  means  of  simulations 
in  a  GSM(EDGE)  system.  The  best  results  are  obtained  for 
a  multistage  version  of  the  training-like  algorithm  based  on 
the  Least  Squares  (LS)  estimation  of  space-time  filter  coef¬ 
ficients. 

1.  INTRODUCTION 

Cellular  mobile  communication  systems  such  as  the  GSM 
make  use  of  RACH  in  order  to  enable  the  initial  access  of 
the  mobile  stations  to  the  network.  Packet  radio  networks 
(like  GPRS  and  EGPRS)  also  make  use  of  similar  channels 
called  Packet  Random  Access  Channels  (PRACH)  not  only 
for  the  initial  access  but  also  during  the  call  since  channels 
are  allocated  to  users  on  a  demand  basis,  rather  than  per¬ 
manently  (as  in  circuit  switched  GSM).  The  random  access 
mechanism  used  in  these  systems  is  based  on  the  Slotted 
ALOHA  principle  [1].  The  throughput  in  a  slotted  ALOHA 
random  access  channel  in  a  TDMA-system  system  can  be 
improved  by  using  capture  effects.  Most  capture  models 
in  TDMA-based  systems  rely  on  power  capture  [2]  and 
not  more  than  one  of  colliding  packets  can  be  recovered. 
Specifically,  when  more  than  one  packet  arrive  at  the  re¬ 
ceiver  simultaneously  only  one  of  them  can  be  captured  at 
the  receiver  given  that  its  power  exceeds  a  specified  thresh¬ 
old.  Capture  of  more  than  one  packet  in  a  collision  of  many, 
leads  to  performance  enhancement.  We  start  from  the  gen¬ 
eral  analysis  of  a  Slotted  ALOHA  system  with  capture.  We 
show  that  the  throughput  can  be  increased  significantly  if  a 
nonzero  probability  of  capture  of  more  than  one  packet  in 


a  collision  is  assumed.  Then  we  propose  three  capture  al¬ 
gorithms  based  on  semi-blind  space-time  filtering.  The  first 
one  is  based  on  a  multistage  procedure  where  each  stage  ex¬ 
ploits  the  conventional  LS  estimator  with  ability  to  capture 
at  most  one  of  the  colliding  packets.  The  second  algorithm 
is  based  on  a  training-like  (TL)  approach  [5,6]  that  allows 
us  to  introduce  a  nonzero  probability  to  recover  more  than 
one  user  in  a  collision  of  many  using  an  one  stage  proce¬ 
dure.  The  third  one  is  a  combination  of  the  multistage  and 
training-like  algorithm.  Simulations  in  a  GSM(EDGE)  con¬ 
text  are  presented,  which  demonstrate  the  superior  perfor¬ 
mance  of  the  multiple  capture  algorithms  compared  to  the 
LS  estimator. 

2.  CAPTURE  EFFECTS  IN  A  SLOTTED  ALOHA 
SYSTEM 

In  order  to  demonstrate  the  performance  enhancement  due 
to  space-time  capture  processing  we  consider  a  simple  S- 
ALOHA  system  with  a  finite  population  of  users,  N.  A  gen¬ 
eralization  of  the  model  described  in  [2,3]  is  adopted  where 
the  input  load  of  the  system  is  described  by  the  probability 
of  packet  arrival  denoted  as  p0.  Each  of  the  users  (termi¬ 
nals)  generates  single  packet  messages  with  probability  po. 
A  discrete  time  system  is  considered  and  transmissions  of 
packets  occur  only  at  the  boundaries  between  two  time  slots. 
If  the  transmission  of  a  packet  is  not  successful  the  terminal 
is  backlogged  and  makes  an  attempt  to  retransmit  the  packet 
in  the  next  time  slot  with  retransmission  probability  pr.  The 
capture  ability  of  the  channel  is  described  by  the  capture 
matrix  C  =  [c{i,j)],  where  c(i,  j)  denotes  the  probability 
that  there  are  i  successfully  received  packets  given  that  there 
are  j  packet  transmission  attempts  in  the  same  time  slot 
(0  <  i,  j  <  N).  It  is  assumed  that  all  transmitting  terminals 
are  aware  of  the  outcome  of  their  transmissions  before  the 
end  of  the  time  slot  through  an  ideal  feedback  (downlink) 
channel.  The  state  of  the  system  can  be  described  by  the 
number  n  of  backlogged  terminals  (0  <  n  <  N). 

The  steady  state  behavior  of  this  discrete  time  Markovian 
system  is  determined  by  the  (N  +  1)  x  (N  +  1)  transition 
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probabilities  matrix  II  =  [7r„,m],  where  7rn,m  is  the  proba¬ 
bility  that  the  state  of  the  system  (population  of  backlogged 
terminals)  is  m  during  time  slot  t + 1,  given  that  during  time 
t  the  state  was  n.  The  adopted  model  allows  us  to  express 
these  transition  probabilities  as  follows: 


N—n  n  min{n— , 

'--EE  E  (  r 

j=0  j— 0  fc=max{n— m+i— j',0} 

pi(  1  -  pr)n~jc(k,i)c(n  -m  +  i  -  k,j). 


) 


(1) 


The  expression  for  the  transition  probabilities  in  [3]  is  a 
special  case  of  (1)  when  the  capture  matrix  of  the  system 
becomes: 


{1  -  Qj,  *  =  0 

Qj,  i  =  l  ,  (2) 

0,  i>  1 

where  qj  is  the  probability  that  one  out  of  j  transmitted 
packets  is  successfully  received.  A  semi-analytical  ap¬ 
proach  has  been  followed  for  the  calculation  of  the  tran¬ 
sition  probabilities.  The  elements  of  the  capture  proba¬ 
bility  matrix  C,  for  the  purposes  of  this  paper,  have  been 
calculated  through  simulation.  In  particular  the  elements 
c(i,j)  with  1  <  i  <  Mi,  1  <  j  <  M2  (typical  val¬ 
ues  Mi  =  3,  M2  =  5)  are  calculated  via  simulation,  and 
c(0,  j)  =  1  -  J2ii\  c(*,i)  for  1  <  j  <  Mi.  Furthermore, 
c(0,0)  =  landc(i,j)  =  0  for  all  other  (i,j). 

The  steady  state  distribution  P  =  {Pk  }k=()  of  the  number 
of  backlogged  users  is  given  as  the  solution  to  the  following 
problem  [4]: 

p  n  =  p  (3) 

under  the  constraint: 

N 

]TP*  =  1.  (4) 

fc=0 


As  a  performance  metric  the  average  number  of  success¬ 
fully  transmitted  packets  per  time  slot  has  been  chosen, 
which  is  referred  to  as  the  average  throughput  S.  The  aver¬ 
age  throughput  can  be  calculated  as  follows: 

S=£S(n)P„,  (5) 

(n,m) 

where  S(n)  denotes  the  number  of  successful  packet  trans¬ 
missions  when  the  system  is  in  state  n  and  can  be  calculated 
by: 

N  N-n  N 

S(n)  =  ^(n  -m  +  i)-  try.  (6) 

m= 0  i=0  j=o 


A  possibility  to  improve  the  system  performance  by 
means  of  capture  effects  is  illustrated  in  Figure  1,  where 
the  average  system  throughput  as  a  function  of  the  retrans¬ 
mission  probability  is  plotted  for  no  capture  and  the  ideal 
capture  ( S  =  Npo)  where  TV  =  10  and  po  =  0.2.  One  can 
see  the  significant  gap  between  these  two  boundary  cases, 
which  can  be  filled  by  curves  corresponding  to  algorithms 
with  multiple  capture  ability. 


Figure  1:  Slotted  ALOHA  throughput  performance  for  the 
boundary  cases 


3.  MULTIPLE  CAPTURE  PROBLEM 
FORMULATION 

The  model  of  a  RACH  collision  is  shown  in  Figure  2.  The 
main  assumptions  are: 

1)  all  colliding  signals  and  Co-Channel  Interference 
(CCI)  have  the  known  structure  of  a  timeslot  (GSM,  for  ex¬ 
ample)  and  they  are  received  synchronously, 

2)  all  signals  are  from  the  same  finite  alphabet  (FA) 
{ah,  h  =  1, ...,  J}  and  all  of  them  have  the  same  training 
sequence  which  is  different  compared  to  the  CCI  training 
sequence, 

3)  channel  coding  is  used  (successful  capture  can  be  de¬ 
tected  by  means  of  parity  check), 

4)  multiple  antenna  is  used  at  the  receiver  (space-time 
interference  rejection  filtering  can  be  applied), 

5)  propagation  channels  for  all  colliding  signals  are  sta¬ 
tionary  over  the  whole  time  slot  (coefficients  of  a  space-time 
filter  can  be  adjusted  by  means  of  off-line  algorithms). 

The  main  difficulty  to  recover  more  than  one  user  is  that 
the  training  data  for  all  access  packets  in  one  cell  is  the 
same.  This  means  that  training-based  algorithms  cannot  be 
directly  applied  for  multiple  capture  reception.  Blind  tech¬ 
niques  could  be  applicable,  but  short  burst  nature  of  Slot¬ 
ted  ALOHA  systems  makes  it  unrealistic  because  of  the  fi¬ 
nite  amount  of  data  effects  [7].  A  possibility  to  address  this 
problem  by  means  of  semi-blind  space-time  filtering  algo¬ 
rithms  is  studied  in  this  paper. 

Note:  The  important  feature  of  the  considered  problem 
is  that  some  probability  of  access  failure  can  be  acceptable 
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“Training-like  symbols”  (any  place  in  a  payload) 


Figure  2:  Model  of  a  RACH  collision 

for  RACH  systems.  Thus,  solutions  without  proven  ability 
to  recover  all  colliding  signals  in  every  time  slot  may  be 
useful. 

4.  MULTIPLE  CAPTURE  ALGORITHMS 

4.1.  Multistage  algorithm 

A  multistage  processing  based  on  cancellation  of  the  recov¬ 
ered  signals  from  the  received  signal  at  successive  stages 
has  been  considered  for  different  applications,  for  example 
in  [8,9].  A  possible  way  to  implement  this  technique  in  the 
considered  problem  is  presented  in  Figure  3  (two  stages  are 
shown  for  simplicity).  The  conventional  LS  algorithm  is 
used  at  each  stage  in  the  Space-time  Filter.  The  possible 
number  of  stages  can  be  found  from  the  applicability  condi¬ 
tion  (misadjustment)  for  the  Noise  Canceller  [10]: 

(Number  of  stages  -  1)  *  Length  of  channels  < 
Number  of  information  symbols  in  a  timeslot 

The  advantage  of  this  algorithm  is  that  more  than  one  user 
may  be  captured  if  the  first  stage  is  successful.  The  dis¬ 
advantage  is  that  no  signals  can  be  recovered  if  there  is  no 
capture  at  the  first  stage.  We  refer  to  this  straightforward 
algorithm  as  the  MLS  (multistage  LS)  and  consider  it  as  a 
reference  point  for  the  enhanced  algorithms  introduced  in 
the  next  two  subsections. 

4.2.  One  stage  training-like  algorithm 

According  to  a  general  TL  approach  [5],  our  proposal  is 
to  use  a  few  information  symbols  in  the  payload  as  an  ex¬ 
tension  of  the  training  sequence.  These  symbols  may  be 
different  for  different  users.  Thus,  the  enlarged  training  se¬ 
quences  may  be  linearly  independent  and  the  LS  estimator 
based  on  these  TL  sequences  can  be  applied.  In  Figure  2 


Figure  3:  Structure  of  the  MLS  algorithm 


these  information  symbols  are  indicated  as  the  TL  symbols. 
The  coefficients  of  the  space-time  filters  and  signal  estima¬ 
tions  corresponded  to  the  TL  sequences  can  be  found  for 
the  FA  signals  using  the  following  training-like  LS  (TLLS) 
algorithm: 

-  form  the  JNtl  TL  sequences 

M  mL  =  {s(ni)s{n2)...s{nNT) 
sm(mi)sm(m2)...sm{mNTL)},  (7) 

where  s(nj),  i  =  1, Nt  are  the  training  symbols, 
{sm(mi)sm(rn2)...sm(mArTI,)}  are  all  JNtl  possible  se¬ 
quences  of  the  FA  signal  of  the  length  Ntl\  ni  and  my  are 
the  positions  of  the  known  and  TL  symbols  ( m ,  i  = 
are  known,  my,  j  =  1. ..Ntl  must  be  selected); 

-  calculate  the  LS  estimations  of  the  weight  vectors  using 
the  TL  sequences 

Wm  =  (R  +  <5I)-1Pm,  m  =  1...JNtl,  (8) 

where 

Nt  Ntl 

R  =  Y,  X(n,)X*(f»i)  +  J2  X(my)X*(my);  (9) 

i— 1  j~  1 

Nt  Ntl 

Pm  =  J]  s*(ni)X(nj)  +  *m(mj)x(mi);  (10) 

i= 1  j= 1 

where  X  is  the  vector  of  input  signals,  6  is  the  regular¬ 
ization  coefficient  [11]  for  the  conventional  LS  estimator 
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which  usually  is  chosen  to  be  close  to  the  variance  of  the 
noise; 

-  select  Mi  weight  vectors  which  minimize  the  distance 
from  the  FA  Qm 

W,  =Wm,,  rtij=  arg  min  Qm,  j  =  l,...,Mu 

(ID 

N, 

Qm  =  y]  rnin(|  ah  -  W*mX(n)  |),  (12) 

n— 1 

where  Ns  is  the  number  of  symbols  in  a  time  slot; 

-  calculate  signal  candidates 


FA)  are  shown  for  Ntl  —  4  (16  TL  sequences  for  the  bi¬ 
nary  FA)  in  the  case  of  two  colliding  users  (M2  =  2).  All 
situations  are  presented  in  Figure  4:  no  capture,  one  of  two 
users  is  captured,  and  two  of  two  users  are  captured.  Our 
goal  is  to  estimate  probabilities  of  these  events  for  different 
M2  and  then  to  calculate  the  system  performance  accord¬ 
ing  to  the  semi-analytical  procedure  presented  in  Section  2. 
The  capture  simulation  results  (estimated  probabilities  pi, 
i  —  1 , 2 , 3  to  recover  one,  two  or  three  colliding  packets)  are 
given  in  Table  1  for  the  conventional  LS  algorithm  (at  most 
one  signal  can  be  captured),  for  the  TLLS  with  Ntl  =  2, 
and  for  the  MTLLS  with  the  same  Ntl ■ 


Sj(n)  =  W*X(n),  (13) 

-  apply  parity  check  to  each  signal  candidate  and  accept 
different  signal  candidates  with  the  positive  parity  check  as 
the  captured  packets. 

The  drawback  of  this  solution  is  that  the  number  of  the 
TL  sequences  grows  exponentially  with  the  number  of  the 
TL  symbols.  Thus,  only  a  small  number  of  the  TL  symbols 
can  be  implemented.  Certainly,  in  this  situation  we  can¬ 
not  guarantee  the  possibility  to  recover  all  signals  in  a  col¬ 
lision  in  each  timeslot.  Nevertheless,  according  to  the  Note 
in  Section  3  this  is  not  necessary  in  the  considered  prob¬ 
lem.  We  have  introduced  a  multiple  capture  ability  in  an 
one  stage  procedure  and,  in  Section  5,  we  will  demonstrate 
the  performance  improvement  for  only  two  TL  symbols  in 
the  GSM(EDGE)  environment. 


4.3.  Multistage  training-like  algorithm 

Capture  ability  can  be  additionally  improved  by  means  of 
multistage  processing  similar  to  that  presented  in  Section 
4. 1  when  the  TLLS  algorithm  is  used  instead  of  the  LS  esti¬ 
mator.  We  refer  to  this  algorithm  as  the  MTLLS  (multistage 
TLLS). 

5.  SIMULATION  RESULTS 

Two  antennas  receiving  in  a  typical  GSM  (J  =  2)  ur¬ 
ban  scenario  TU50  is  assumed,  where  SNR=35dB  and 
SIR=6dB.  In  all  cases  a  space-time  filter  with  five  coef¬ 
ficients  in  each  channel  is  used.  For  each  time  slot,  the 
transmitted  bits  are  obtained  by  channel  encoding  of  one 
data  block.  The  channel  coding  scheme  includes  a  (34,28) 
systematic  cyclic  redundancy  check  (CRC)  code  (which  ac¬ 
cepts  28  bits  at  the  input  and  provides  6  parity  check  bits 
at  the  output),  and  a  (3,1,5)  convolutional  code  (rate  1/3, 
constraint  length  5). 

A  possibility  to  capture  more  than  one  user  in  a  collision 
for  the  TLLS  algorithm  is  illustrated  in  Figure  4,  where  the 
typical  curves  for  the  selection  criteria  (distance  from  the 


Figure  4:  Illustration  of  the  selection  step  in  the  TLLS  for 
Ntl  —  4 

Table  1.  Estimated  probabilities  to  capture  one/two/three 
packets  in  a  collision  of  one/.. ./five  packets 


m2 

Pi 

Algorithm 

LS 

TLLS 

MLS 

MTLLS 

1 

Pi 

1 

1 

1 

1 

P2 

0 

0 

0 

0 

P3 

0 

0 

0 

0 

2 

Pi 

0.87 

0.47 

0.01 

0 

P2 

0 

0.51 

0.86 

0.96 

P3 

0 

0 

0 

0 

3 

Pi 

0.68 

0.56 

0.21 

0.20 

P2 

0 

0.30 

0.14 

0.09 

P3 

0 

0.03 

0.033 

0.59 

4 

Pi 

0.54 

0.56 

0.32 

0.32 

P2 

0 

0.17 

0.13 

0.23 

P3 

0 

0.01 

0.1 

0.19 

5 

Pi 

0.41 

0.47 

0.30 

0.35 

P2 

0 

0.1 

0.09 

0.15 

P3 

0 

0 

0.02 

0.05 
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The  corresponding  curves  for  the  average  system  through¬ 
put  as  a  function  of  the  retransmission  probability  are  shown 
in  Figure  5  for  the  conditions  indicated  in  Section  2.  One 
can  see  the  significant  performance  improvement  for  the  en¬ 
hanced  algorithms,  especially  for  the  MTLLS,  compared  to 
the  conventional  LS  estimator  even  for  only  two  TL  sym¬ 
bols. 


Figure  5:  Slotted  ALOHA  throughput  performance  for  dif¬ 
ferent  algorithms 


6.  CONCLUSION 

It  has  been  shown  analytically  that  a  possibility  to  recover 
more  than  one  user  in  a  RACH  collision  can  significantly 
improve  system  performance.  A  semi-analytical  approach 
has  been  proposed  to  evaluate  the  average  throughput  over  a 
Slotted  ALOHA  system  with  multiple  capture.  Three  semi¬ 
blind  space-time  filtering  algorithms  with  multiple  capture 
ability  have  been  presented.  Their  efficiency  compared  to 
the  training-based  algorithm  with  a  power  capture  has  been 
demonstrated  in  a  GSM(EDGE)  environment. 
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ABSTRACT 

In  digital  communication  systems,  most  commonly  used  sig¬ 
naling  constellations  are  symmetric.  Without  a  pilot  tone 
or  known  training  sequence,  an  arbitrary  phase  rotation 
cannot  be  identified  from  a  symmetric  constellation.  The 
standard  approach  to  overcome  the  phase  ambiguity  is  to 
use  differential  encoding.  In  this  paper,  we  introduce  the 
notion  of  using  an  asymmetric  constellation  instead  of  a 
symmetric  constellation  with  differential  encoding.  The  ab¬ 
solute  phase  of  an  asymmetric  constellation  can  be  deter¬ 
mined  using  blind  statistics  of  processed  channel  outputs. 
Through  simulation  and  analysis,  we  study  the  trade-offs 
between  asymmetry  and  other  features  of  a  constellation, 
such  as,  data  rate,  power,  and  symbol  separation. 


1.  INTRODUCTION 

A  symmetric  constellation  has  the  property  that  blind  pro¬ 
cessing  is  unable  to  identify  an  arbitrary  rotation  of  sym¬ 
bols.  Synchronization  with  the  phase  of  the  transmitted 
carrier  may  be  done  by  using  pilot  tones  or  known  training 
sequences.  In  blind  system,  without  a  pilot  tone  or  training 
sequence,  the  receiver  must  rely  on  statistics  of  channel  out¬ 
puts  to  recover  the  phase  of  the  received  signal.  All  of  the 
commonly-used  symbol  constellations  -  PAM,  PSK,  QAM, 
and  others  -  are  symmetric  when  the  symbols  are  equiprob- 
able.  Blind  statistics  of  these  constellations  cannot  produce 
an  absolute  phase  estimate.  To  overcome  the  phase  ambigu¬ 
ity,  a  mapping  between  the  data  and  the  symbols  has  to  be 
invariant  to  an  unknown  reference  phase.  A  simple  method 
is  to  use  differential  encoding.  Since  each  symbol  is  used 
to  determine  two  symbol  transitions,  a  symbol  decision  er¬ 
ror  will  usually  result  in  two  transition  errors.  The  penalty 
incurred  by  differential  encoding  is  well  characterized  as  a 
2-3  dB  loss  in  SNR  [2],  [3]. 

In  this  paper,  we  introduce  the  notion  of  using  an  asym¬ 
metric  constellation.  The  absolute  phase  of  an  asymmetric 
constellation  can  be  estimated  using  blind  statistics  of  pro¬ 
cessed  channel  outputs.  We  discuss  symmetry,  asymme¬ 
try,  and  how  to  design  asymmetric  constellations  and  abso¬ 
lute  phase  estimators.  Through  simulation  and  analysis,  we 
study  the  performance  of  various  absolute  phase  estimators 
and  the  trade-offs  between  asymmetry  and  other  features 
of  a  constellation. 


2.  SYMMETRY  BREAKING 

M- ary  PAM,  QAM,  and  PSK  are  the  most  often  encoun¬ 
tered  symmetric  constellations.  A  symmetric  constellation 
may  be  rendered  asymmetric  by  changing  the  symbol  values 
and/or  the  symbol  probabilities. 

Consider  an  M- ary  constellation  with  M  —  2m  equiprob- 
able  i.i.d.  symbols.  The  data  rate  (in  bits/symbol)  or  en¬ 
tropy  of  the  constellation  is 

M- 1 

H{S)  =  -  ^2  PAog2Pi  =  m  (1) 

1=0 

where  pi  is  the  probability  of  symbol  i.  If  the  number  of 
symbols  and  the  symbol  locations  are  to  remain  unchanged, 
an  asymmetric  constellation  can  be  obtained  by  adjusting 
the  symbol  probabilities.  Because  the  symbols  in  the  asym¬ 
metric  constellation  are  no  longer  equiprobable,  the  data 
rate  of  the  constellation  is  strictly  lower  than  that  of  the 
corresponding  symmetric  constellation.  This  is  a  trade-off 
of  data-rate  for  asymmetry. 


Figure  1:  Symmetric  and  Asymmetric  8-PSK(ManipuIation 
of  the  Symbol  Probabilities) 

Fig.  1(a)  illustrates  an  8-PSK  constellation  with  equiprob¬ 
able  symbols  \/~A  ■  i  =  0,  ••  • ,  7,  constant  transmitted 

power  A,  and  a  data  rate  of  3  bps.  On  the  right  is  an 
asymmetric  8-PSK  obtained  by  manipulating  the  symbol 
probabilities.  The  value  of  p o  has  been  reduced  by  a  small 
<5,0  <  <5  <  1/8.  To  maintain  ^2]=0Pi  =  1  and  a  zero  DC 
value,  the  probabilities  of  some  other  symbols  have  also 
been  changed.  Since  the  symbols  in  the  second  constellation 


0-7803-5988-7/00/$10.00  ©  2000  IEEE 


161 


are  no  longer  equiprobable,  the  data  rate  of  the  constella¬ 
tion  is  strictly  less  than  3  bps.  Figure  2  shows  the  exact 
reduction  in  entropy  as  a  function  of  <5  for  0  <  <5  <  0.12. 


(2) 


Figure  2:  8  vs.  H(S)  for  the  Asymmetric  8-PSK 


Figure  3:  Symmetric  and  Asymmetric  8-PSK  (Symbol  Re¬ 
location) 


In  Fig.3,  another  alternative  to  introducing  asymmetry 
is  by  relocating  some  of  the  original  8-PSK  constellation 
points  without  changing  the  equal  probability  assigned  to  8 
points.  With  symbol  probability  and  power  unchanged,  the 
symbol  si  is  rotated  counterclockwise  by  <5  radians.  In  order 
to  maintain  the  zero-mean  condition,  symbols  S2,  sg,  and 
S7  must  also  be  relocated.  The  symbols  S2,  $g,  and  sr  are 
rotated  by  — e,  -Fe,  and  —8  radians,  respectively,  where  e  = 
sin~]{cos(7r/4)-(l  —  cos((S)+sin(<5))}.  However,  introducing 
asymmetry  by  moving  some  symbols  closer  together  will 
cause  more  erroneous  symbol  decisions.  With  perfect  phase 
estimation,  the  union  bound  of  the  error  rate  is 


For  small  8,  the  error  rate  performance  can  be  very  close  to 
that  of  coherent  symmetric  8-PSK  constellation. 

3.  ABSOLUTE  PHASE  ESTIMATION 
3.1.  Maximum  Likelihood  Approach 

Consider  the  transmission  of  nonequiprobable  M-PSK  sig¬ 
nals  over  an  AWGN  channel.  The  M-PSK  symbol  has  the 
complex  form  s;  =  ,  *  =  0, 1, . . . ,  M  —  1,  where 

s[A  denotes  the  constant  signal  power.  The  transmitted 
symbol  x[n]  =  s,  with  probability  p,.  The  corresponding 
received  sequence  is  then 

r[n]  =  x[n]ej *  +  w[n]  n  =  0, 1, N  -  1  (3) 

where  w[n]  is  a  sample  of  zero-mean  complex  white  Gaus¬ 
sian  noise  and  <j>  £  (0, 27r)  is  an  arbitrary  phase  introduced 
by  the  channel.  For  the  assumed  AWGN  model,  the  pdf  of 
r[n]  can  be  modeled  as  a  mixture  of  M  distributions 


p(r[n];  <t>)  =  (/>;)  ■  /j(r[n];  (/>) 


>= o 


where 


(  /  n  ^  1  (  |r[n]  -  Sie-7'1^2^ 

fi(r[n];  =  ^  6XP  - 2^ - )  ■ 


Then  the  pdf  of  the  sequence  r  is 


P(r;0)  =  JJ 


n=0 


^2  (Pi)  ■  fi(r[n];<f> ) 


i= 0 


(4) 

(5) 

(6) 


The  MLE  of  <f>  is  the  value  that  maximizes  the  likelihood 
function  in  Eq.(6).  In  general,  the  derivative  of  lnp(r;  <j>) 
with  respect  to  4>  does  not  reduce  to  a  simple  form.  The 
MLE  of  <f>  can  be  obtained  numerically  by  using  iterative 
maximization  procedures.  The  difficulty  with  the  use  of 
these  numerical  methods  is  that  in  general  the  point  found 
may  not  be  the  global  maximum  but  possibly  only  a  local 
maximum  or  even  a  local  minimum. 

A  simpler  alternative  likelihood  method  of  finding  the 
absolute  phase  is  based  on  the  use  of  the  phase  statistics. 
We  may  express  the  unknown  phase  as  cj>  —  4>o  +  k  ■  (-jj) 
where  k  =  integer,  0  <  k  <  M  —  1  and  0  <  cf>o  <  2tv/M. 
Let  9[v]  be  the  phase  angle  of  the  received  sequence  r[h]. 
The  <j> o  is  obtained  first  by 


where 


00  =  ^ 

(7) 

/  N- 1  \ 

4>  =  angle  ^  ejM6^j 

(8) 

is  the  mean  phase  angle  of  the  received  sequence  after  each 
phase  angle  has  been  multiplied  by  M.  Then,  the  maximum 
likelihood  method  can  be  applied  to  find  the  correct  value 
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of  integer  k.  Using  the  estimate  <j>o,  the  complex  plane  is 
divided  into  slices  qit  i  =  0,1, M  -  1  bounded  by  phase 
angles  { ir/M  +  (f>o  +  i  ■  (2n/M),  i  =  0, 1, M  —  1}.  Al¬ 
though  the  nonequiprobable  symbols  do  cause  the  optimum 
symbol-by-symbol  decision  boundaries  to  change  at  the  re¬ 
ceiver,  these  angular  decision  boundaries  are  close  to  being 
optimum  for  small  8.  Let  U  be  the  number  of  points  in  the 
received  data  sequence  that  fall  in  each  region  qi.  Then,  we 
are  able  to  obtain  the  integer  k  that  maximizes  the  likeli¬ 
hood  function  defined  as 

k  =  arg  maxp(n;  k)  (9) 

k 


where  p(n;  k)  can  be  modeled  as  a  multinomial  distribution 
with  n  =  [no  ni . . .  nw-i]  and  n;  =  l(i+k)  (mod  M) 


p(  n;  k ) 


AH 


no\m\  ■  ■  •  fiM-i 
M- 1 


■nnonni  ■  ■  ■  nnM~1 

,Po  Pi  Pm-  1 


m  n 

Tlrl 


(10) 


This  is  equivalent  to  finding  the  integer  k  that  maximizes 
the  log-likelihood  function 


M- 1 

lnp(n;  k)  =  In  p"*' 
i= 0 


M- 1 

yj  In  Pi- 

i=o 


(11) 


Therefore,  by  using  simple  bin  statistics,  the  absolute 
phase  estimate  is 

4>  =  fo  +  k-Q).  (12) 

To  make  the  correct  decision  in  estimating  k,  the  es¬ 
timates  p,  should  be  close  to  their  true  value  pi.  Finding 
the  sample  size  N  to  generate  reliable  estimates  of  the  pi 
requires  the  joint  probabilities  that  the  p,  lie  within  some 
e  intervals  centered  on  the  correct  values.  Using  Cheby- 
shev’s  Inequality,  we  can  roughly  determine  the  number  of 
required  samples  N  such  that  the  estimate  pi  is  within  e  of 
its  correct  value  pi  with  probability  1  —  r. 

P{\pi-Pi\<t}>l-^  =  l-r  (13) 


where  pi  =  rn/N  with  variance  af  =  {p;(l  —  Pi)} /N.  Set¬ 
ting  e  =  S/P,  the  required  N  is  given  by 


N  Pi(l-P»)P2 
rS 2 


(14) 


For  the  asymmetric  setting  shown  in  Fig.  1(b),  the  most 
likely  incorrect  k  axe  the  correct  value  of  k  offset  by  ±2 
(mod  8).  The  two  largest  probability  symbols  pi  and  pr 
in  Fig.l  appear  to  be  the  most  critical  values  to  consider. 
From  Eq.(14),  setting  pi  =  pi  =  1/8  +  a  =  1/8  -I-  8/ y/2,  we 
obtain 

(7  +  8s/2S  -  32 S2)P2 
64  t82 


N  = 


(15) 


In  Fig.4,  we  plot  ( Nt/P 2)  as  a  function  of  8.  As  an 
example,  setting  r  =  0.1  and  P  =  2\/2  which  corresponds  to 
setting  e  to  half  of  the  difference  between  the  largest  and  the 


Figure  4:  (Nt/P2)  as  a  function  of  8 


Figure  5:  Comparison  of  MSE  of  phase  estimation  with 
different  8. 


second  largest  values  of  symbol  probabilities,  at  <5  =  0.06, 
(Nt/P2)  «  32  which  gives  N  «  2,600  samples.  Figure  5 
shows  simulation  results  for  the  MSE  of  phase  estimation 
as  a  function  of  N  for  a2  =  0.01  and  A  —  1.  At  the  same 
level  of  MSE  performance,  the  sample  size  needed  for  8  = 
0.03  is  approximately  4  times  larger  than  the  sample  size 
needed  for  8  —  0.06.  While  the  MSE  performance  includes 
contributions  from  both  estimation  of  cf>o  and  estimation  of 
k,  the  relative  dependence  of  N  on  8  (i.e.  the  factor  by 
which  N  increases  for  decreasing  8)  is  captured  well  by  the 
approximation  of  Fig.4. 

Figure  6  illustrates  the  error  probability  performance  of 
the  various  approaches.  The  bottom  dashed  line  shows  the 
error  probability  performance  of  the  coherent  symmetric 
8-PSK.  The  top  curve  shows  the  error  probability  of  sym¬ 
metric  8-DPSK.  The  stars  show  the  simulated  error  rates 
of  asymmetric  8-PSK  with  8  —  0.06  (H(S)  =  2.9542)  and 
\/A  =  1.  The  symbols  are  rotated  by  an  unknown  con¬ 
stant  phase  <p  €  (0,  2x)  radians  and  further  distorted  by 
AWGN.  Statistics  of  1,000  samples  are  used  to  estimate  the 
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Figure  6:  Asymmetric  8-PSK,  Symmetric  8-PSK,  and  Sym¬ 
metric  8-DPSK  Error  Rate  Comparison 


absolute  phase  angle.  As  shown  in  Fig. 6,  the  performance 
of  asymmetric  8-PSK  is  close  to  3  dB  better  than  that  for 
symmetric  8-DPSK  at  large  SNR.  Note  that  the  data  rate 
of  asymmetric  constellation  is  less  than  that  of  symmetric 
8-DPSK  by  approximately  1.52%.  Thus,  in  order  to  make  a 
meaningful  comparison  of  these  two  modulation  methods, 
we  should  allow  the  8-PSK  symmetric  constellation  to  use 
some  form  of  encoding  with  rate  0.985.  However,  to  obtain 
a  coding  gain  of  3  dB,  the  rate  will  have  to  be  significantly 
lower.  Thus  we  conclude  that  when  we  have  large  enough 
sample  size  for  phase  estimate,  the  performance  of  asym¬ 
metric  8-PSK  can  be  close  to  that  for  coherent  symmetric 
8-PSK  constellation. 

3,2.  Nonparametric  Methods 

Without  any  prior  knowledge  on  probability  distribution 
and  exact  locations  of  symbol  values,  nonparametric  or 
distribution-free  methods  can  be  used  to  estimate  the  ab¬ 
solute  phase  rotation. 


Figure  7:  An  absolute  phase  estimation  scheme  for  Asym¬ 
metric  8-PSK  obtained  by  changing  symbol  locations 

For  an  asymmetric  constellation  obtained  by  changing 


the  symbol  locations  as  in  Fig. 3,  a  simple  and  effective 
scheme  for  phase  estimation  is  based  on  noting  that  at  the 
correct  zero  angle,  roughly  half  of  the  samples  will  fall  in 
two  angular  regions  bounded  by  7r/4  and  7r/2  and  by  — 7r/4 
and  —  7t/2,  shown  as  two  shaded  regions  in  Fig. 7.  The  abso¬ 
lute  phase  can  be  estimated  by  searching  for  the  angle  that 
gives  the  maximum  number  of  points  in  these  two  angular 
bins.  This  scheme  works  well  in  the  presence  of  some  noise, 
however,  at  high  SNR,  this  scheme  is  only  able  to  obtain 
the  estimate  within  e  of  the  correct  phase  angle.  We  can 
further  search  for  the  angle  within  this  range  that  gives  the 
minimum  mean  square  error  from  the  center  angle  between 
these  search  sectors. 


Figure  8:  Comparison  of  the  noise  performances  of  asym¬ 
metric  8-PSK  constellations  obtained  by  changing  symbol 
locations,  with  different  S  (symbol  relocation  case). 


Figure  9:  Comparison  of  MSE  of  phase  estimation  with 
different  <5  (symbol  relocation). 

Figure  8  shows  the  error  probability  performance  for 
symmetric  8-DPSK  and  asymmetric  8-PSK  obtained  by 
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changing  symbol  locations,  with  different  values  of  S.  The 
top  and  bottom  dashed  lines  show  the  error  rate  perfor¬ 
mance  of  symmetric  8-DPSK  and  symmetric  8-PSK,  respec¬ 
tively.  The  solid  lines  show  the  union  bound  of  the  error 
performance  and  x  marks  show  the  simulated  results.  Re¬ 
sults  are  based  on  1,000  equiprobable  i.i.d.  symbols  rotated 
by  an  unknown  phase  <j>  and  further  corrupted  by  AWGN. 

Figure  9  illustrates  the  MSE  of  the  phase  estimate  with 
different  values  of  5 ,  assuming  that  the  equiprobable  i.i.d. 
symbols  are  rotated  by  an  unknown  phase  <f>  and  further 
distorted  by  AWGN  with  variance  0.01.  Fig.8  illustrates 
that  with  large  <5,  the  estimate  converges  to  the  absolute 
phase  faster  than  with  low  S.  However,  with  larger  S,  some 
symbol  points  are  relocated  closer  to  adjacent  symbol  points 
which  will  cause  more  erroneous  symbol  decisions. 

The  shape  of  the  mask  that  is  used  in  estimating  the 
absolute  phase  is  not  unique.  The  mask  shown  in  Fig.  7 
is  just  an  example.  We  can  use  different  masks  bounded 
by  some  different  angular  boundaries,  such  as,  half-plane 
shape  bounded  by  — tt/2  and  7r/ 2  angles.  The  properties 
of  a  good  mask  shape  are  straightforward.  It  should  give 
a  maximum  number  of  points  at  the  correct  angle  and  the 
number  of  points  in  the  mask  should  fall  when  it  rotates 
away  from  the  correct  angle.  Sensitivity  analysis  can  be 
used  to  evaluate  the  performance  of  the  mask. 

4.  DISCUSSION 

A  symmetric  constellation  may  be  rendered  asymmetric  by 
changing  the  symbol  values  and/or  the  symbol  probabili¬ 
ties.  Between  these  two  methods  of  introducing  asymmetry 
to  existing  symmetric  8-PSK,  manipulating  symbol  proba¬ 
bilities  will  certainly  cause  some  reduction  of  the  number  of 
data  bit  transmitted  per  symbol  and  some  additional  com¬ 
plexity  in  encoding/decoding  process  to  obtain  the  asym¬ 
metric  probability  arrangement.  For  the  second  asymmet¬ 
ric  arrangement,  the  symbol  probabilities  are  remain  un¬ 
changed,  so  the  data  rate  is  the  same  as  that  of  a  symmet¬ 
ric  constellation,  without  additional  complexity  in  a  coding 
process. 
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5.  CONCLUSION 

An  asymmetric  constellation  is  introduced  as  an  alternative 
to  regular  symmetric  constellation  with  differential  encod¬ 
ing.  Without  the  use  of  a  pilot  tone  or  known  training 
sequence,  the  absolute  phase  of  received  symbols  can  be  es¬ 
timated  blindly  from  asymmetric  constellation  using  simple 
statistics  of  the  received  symbols.  By  introducing  asymme¬ 
try  to  existing  symmetric  constellation,  the  absolute  phase 
recovery  function  is  obtained  at  the  cost  of  very  small  re¬ 
duction  in  entropy  and/or  minimum  distance.  Both  the 
asymmetry  of  a  constellation  and  the  phase  recovery  func¬ 
tion  may  be  considered  as  choices  much  as  symbol  separa¬ 
tion,  the  number  of  bits  transmitted  per  symbol,  or  power, 
providing  new  tools  for  constellation  design. 
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ABSTRACT 

In  short  burst  wireless  communications,  a  training  se¬ 
quence  is  incorporated  in  each  burst  for  the  receiver 
to  adjust  the  equalizer  coefficients.  However,  when  the 
amount  of  training  symbols  is  less  than  the  spatial- 
temporal  equalizer  tap  weights,  conventional 
least-square  technique  may  not  provide  good  MSE  per¬ 
formance.  Blind  methods,  on  the  other  hand,  may 
not  achieve  equalization  in  a  short  burst.  A  regular¬ 
ized  semi-blind  algorithm  was  proposed  previously  by 
Kuzminskiy  et  al.  to  overcome  this  problem  but  lo¬ 
cal  minima  exist  in  the  algorithm.  A  convex  cost  with 
training  symbols  as  the  equalizer  constraint  is  proposed 
in  this  paper  to  avoid  cost-dependent  local  minima. 
Furthermore,  comparison  with  the  regularized  semi¬ 
blind  algorithm  suggests  that  the  proposed  algorithm 
achieves  a  lower  MSE  performance  in  the  case  of  non¬ 
constant  modulus  signals  such  as  16-QAM  signals. 

1.  INTRODUCTION 

Conventional  equalization  techniques  in  wireless  com¬ 
munications  require  transmission  of  training  sequences. 
This  represents  a  system  overhead  and  effectively  re¬ 
duces  the  information  rate.  On  the  other  hand,  blind 
equalization  algorithms  do  not  require  training.  One  of 
the  most  popular  blind  algorithms  is  the  family  of  con¬ 
stant  modulus  algorithms  (e.g.  CMA  2-2  or  Godard  [2] 
algorithm,  CMA  1-2  or  Sato  algorithm).  There  are 
several  disadvantages  in  using  the  CMA  family  of  algo¬ 
rithms.  One  of  them  is  the  existence  of  local  minima. 
In  situations  where  fractionally-spaced  equalizer  or  an¬ 
tenna  array  are  used,  the  Godard  algorithm  was  shown 
to  converge  globally  [3j.  Unfortunately,  this  is  not  true 
for  CMA  1-2  (Sato)  algorithm  which  was  demonstrated 
to  have  cost-dependent  local  minima  in  either  case  [4], 
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Another  drawback  of  blind  algorithms  is  the  slow  con¬ 
vergence  and  inability  to  achieve  equalization  in  a  short 
burst. 

A  regularized  semi-blind  algorithm  was  proposed 
in  [1]  which  combined  the  LS  and  CM  1-2  costs.  The 
ability  to  successfully  equalize  the  channels  with  a  spa¬ 
tial-temporal  filter  was  demonstrated  and  thus  offered 
the  possibility  of  reducing  the  number  of  training  sym¬ 
bols.  However,  local  minima  inherent  to  the  cost  exist. 
Using  a  convex  cost  will  eliminate  the  possibility  of  con¬ 
vergence  to  cost-dependent  local  minima.  Blind  con¬ 
vex  cost  with  equalizer  tap- anchoring  was  introduced 
in  [5,  6].  In  this  paper,  we  shall  make  use  of  the  training 
sequence  in  conjunction  with  the  blind  convex  cost  [6] 
to  formulate  a  new  and  more  efficient  semi-blind  algo¬ 
rithm.  Simulation  results  demonstrate  the  potential  of 
the  proposed  algorithm  for  constant  and  non-constant 
modulus  signals. 


2.  SPATIAL-TEMPORAL  SIGNAL  MODEL 

We  assume  there  are  K  users  in  the  model.  One  of  the 
user  is  the  signal  of  interest.  Without  loss  of  generality, 
we  shall  denote  the  first  user  to  be  the  desired  signal. 
The  remaining  K— 1  signals  are  coming  from  nearby  co¬ 
channel  cells.  At  the  base  station  receiver,  an  antenna 
array  of  M  sensors  is  employed. 

The  data  is  processed  in  a  burst  of  N  symbols  which 
are  assumed  to  be  received  under  a  stationary  environ¬ 
ment.  There  are  Nt  training  symbols  in  each  burst 
and  the  starting  position  of  the  training  sequence  is  Ns 
which  is  assumed  to  be  known.  The  transmitted  signals 
undergo  linear  channels  which  are  assumed  to  be  FIR 
of  length  Nc.  This  assumption  is  valid  when  we  have  a 
finite  delay  spread.  Equalization  is  necessary  when  the 
delay  spread  is  larger  than  the  symbol  duration.  The 


0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


166 


received  signal  at  the  j-th  sensor  is  given  by: 

K 

Vi(n)  =  J2  ciJxi(n)  +  vAn) 

2=1 

for  i  =  1, . . .  ,  K,  j  =  1, . . .  ,  M 

Cij  —  [cij  (0) ,  .  .  .  ,  Cij(Nc  —  1)]  , 

Xj(n)  =  [^(n),...  ,Xi(n-  Nc  +  1)]T, 

where  H  denotes  the  conjugate  transpose  of  a  matrix 
and  each  c^-  ( n )  is  a  complex  Gaussian  random  variable 
whose  amplitude  does  not  change  over  the  duration 
of  the  burst.  The  noise  Vj(n)  is  a  complex  circularly 
symmetric  additive  white  Gaussian  noise  of  variance 

<*■ 

Recall  that  the  first  user  is  the  desired  signal.  The 
equalizer  output  for  the  signal  of  interest  is  given  by: 

zi(n)  =  wHy  (n),  (4) 

where  y(n)  =  [yf(n), . . .  ,y^(n)]T  and  each  y j(n)  = 
[yj (n) , . . .  ,  yj{n  -  Nw  +  1)]T.  The  spatial-temporal 
equalizer  taps  are  w  =  [w^, . . .  ,  wj?]r  and  each 
w j  =  [wji, . . .  ,  WjNw}T-  The  vector  w  has  a  dimension 
of  MNW  x  1. 


(1) 

(2) 

(3) 


3.  LS  SOLUTION  WITH  FEW  TRAINING 
SYMBOLS 


When  a  burst  of  known  symbols  (training)  is  received, 
the  method  of  least  square  can  be  used  to  obtain  the 
spatial-temporal  equalizer  coefficients.  The  following 
equation  is  satisfied: 

Rw  =  p,  (5) 


where  R  is  the  time-averaged  spatial-temporal  auto¬ 
correlation  matrix 


Ns+Nt-1 

r  =  y(nMn)H’  (fi) 

Jrz, 


and  p  is  the  time-averaged  spatial-temporal  cross-cor¬ 
relation  matrix 


P 


Nt 


N,+Nt- 1 

x\{n-d)y{n) 

n=N , 


(7) 


for  some  delay  d. 

If  the  number  of  training  symbols  is  fewer  than 
the  number  of  spatial-temporal  equalizer  coefficients 
NWM ,  R  has  null(R)  =  NWM  -  Nt.  Therefore  there 
are  many  solutions  to  (5)  which  can  be  expressed  as: 

NWM-Nt 

w  =  R+p+  ^  WjUj,  (8) 

2=1 


where  R+  is  the  pseudo-inverse  of  R,  Uj’s  are  a  set 
of  orthonormal  basis  of  the  null  space  of  R  and  wt’s 
are  a  set  of  coefficients.  Equation  (8)  can  be  expressed 
compactly  as: 

w  —  R+p  +  Uv,  (9) 

where  U  =  [Ui,  U2, . . .  ,  U NWM-Nt]  an^ 
v  =  [vi,v2,...  ,VNwM-Nt}T- 

The  semi-blind  algorithm  in  [1]  tried  to  regularize 
the  standard  LS  solution  with  the  CM  1-2  cost  to  pro¬ 
vide  a  better  estimation  of  the  equalizer  coefficients  in 
the  case  of  Nt  <  NWM.  The  algorithm  minimizes  the 
cost 


N,+N,-l 

J (w)  =—  \zi(n)  -  X!(n  -  d)\2 

n-N. 

+  \-Rl? 

72= 1 

where  R\  =  E\an\2/E\an\  with  an  being  the  alphabets 
in  a  signal  constellation  and  0  <  p  <  00  is  a  regularized 
constant.  We  shall  refer  readers  to  [1]  for  details  of  the 
algorithm. 

4.  SEMI-BLIND  EQUALIZATION  BASED 
ON  A  CONVEX  COST  FUNCTION 

4.1.  Background 

Since  cost-dependent  local  minima  exist  in  the  regular¬ 
ized  semi-blind  algorithm,  there  are  two  ways  to  avoid 
convergence  to  such  minima:  1)  devising  a  good  initial¬ 
ization  strategy  of  equalizer  tap  weights  or  2)  choosing 
alternative  cost  functions  that  are  convex.  In  this  pa¬ 
per,  we  are  primarily  interested  in  adopting  a  convex 
cost  function  in  the  problem  of  semi-blind  equalization. 

In  [5]  (and  references  therein),  a  convex  cost  func¬ 
tion  based  on  the  norm  of  an  equalizer  output  was 
proposed  in  the  context  of  blind  equalization.  The  idea 
comes  from  the  fact  that  the  opening  of  the  eye  of  the 
signal  constellation  is  characterized  by  the  intersym¬ 
bol  interference  (ISI).  Suppose  the  combined  channel- 
equalizer  response  is  c*w  =  h,  the  eye  is  opened  when 
the  magnitude  of  h(5)  for  some  delay  5  dominates  the 
rest  of  the  coefficients,  IM*)I-  This  is  closely  re¬ 
lated  to  the  l\  norm  of  the  combined  channel-equalizer 
response.  In  practice,  however,  we  can  never  know  the 
channel  response  explicitly.  An  equivalent  but  more 
useful  formulation  is  using  the  norm  of  the  equalizer 
output  [5,  6,  7].  In  [6],  the  following  cost  is  proposed: 

J(  w)  =  ||Re(z(n))||00  +  ||Im(z(n))||oo  (H) 
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with  the  constraint 

Re(wjfc)  +  Im(wjfc)  =  1.  (12) 

Two  remarks  about  (11)  and  (12)  are  in  order: 

1.  The  cost  (11)  is  appropriate  for  square- type  con¬ 
stellations  such  as  4-QAM,  16-QAM  etc. 

2.  The  constraint  (12)  anchors  one  of  the  equalizer 
taps.  This  is  needed  to  avoid  the  all-zero  equal¬ 
izer  coefficients  which  is  a  valid  but  trivial  mini¬ 
mum  to  this  type  of  convex  cost  function. 

4.2.  Convex  cost  with  training  constraint 

In  this  section,  we  propose  a  linear  constraint  to  be  used 
for  the  convex  cost  (11).  We  call  it  semi-blind  because 
the  linear  constraint  makes  use  of  the  small  amount  of 
known  training  symbols  present  in  the  received  burst 
of  data.  The  idea  was  essentially  discussed  in  the  pre¬ 
vious  section.  When  the  number  of  training  symbols  is 
fewer  than  the  spatial-temporal  equalizer  coefficients, 
the  solution  of  the  LS  problem  can  be  expressed  as  (and 
restated  here): 

Rw  =  p  (13) 

w  =  R+p  +  Uv.  (14) 

Equation  (13)  can  be  viewed  as  a  constraint  on  the 
equalizer  and  can  be  adopted  to  replace  the  tap-an¬ 
choring  technique.  Hence  (11)  and  (13)  describe  our 
semi-blind  convex  cost. 

There  are  several  properties  of  this  semi-blind  algo¬ 
rithm: 

1.  The  semi-blind  constraint  (13)  is  linear.  It  can  be 
thought  of  as  a  generalization  of  the  tap-anchoring 
technique. 

2.  Because  of  the  linear  constraint,  convexity  of  the 
cost  (11)  is  still  preserved. 

3.  Convexity  of  the  cost  (11)  is  established  in  a  dou¬ 
bly  infinite  equalizer  (ideal)  setting  and  also  in 
a  finitely  parameterized  equalizer  (practical)  set¬ 
ting  [6].  Therefore,  using  an  FIR  equalizer  main¬ 
tains  convexity  unlike  the  Godard  cost  function. 

4.  As  in  the  case  of  the  blind  convex  cost  function, 
this  kind  of  equalization  technique  leaves  an  un¬ 
known  gain  at  the  equalizer  output  [7] .  Hence  an 
automatic  gain  control  (AGC)  is  needed  to  scale 
the  output.  This  can  be  done  with  the  knowledge 
of  the  known  signal  constellation. 


4.3.  Implementation 

Since  norm  cannot  be  implemented  in  practice,  we 
approximate  the  norm  with  lp  norm  for  some  large 
P : 

J(w)  =  ||Re(z(n))||oo  +  ||Im(z(n))||00 

-  lim  ll^e(2(n))llp  +  ||Im(z(n))||p 
P  A  (15) 

~  (£|Re(z(n))|p)p  +  (£|Im(z(n))|p)* 
for  large  p. 

Convexity  is  preserved  in  this  approximation  [7].  In 
actual  implementation,  we  can  minimize  the  cost 

J  (w)  =  £|Re(z(n))|p  +  £|Im(z(n))|p  (16) 

to  simplify  computation.  Substituting  (14)  in  (16)  and 
taking  the  gradient  with  respect  to  v* ,  we  obtain 

G  =  Vv.  J(v)  =  E|pU"y(n)(|Re(z(n))|p-2Re(z(n)) 

-  j'|Im(z(n))|p-2Im(z(n))j  1. 

(17) 

The  received  data  is  processed  in  a  burst  of  N  sym¬ 
bols.  A  recursive  method  based  on  the  gradient  descent 
is  used  to  obtain  the  spatial-temporal  equalizer  coeffi¬ 
cients.  The  algorithm  is  given  by: 

v(fc+i)  =  v(fc)  _  (fe>  (18) 

where  denotes  the  vector  v  at  the  k-th  recursion, 
H  is  a  small  step  size  and  is  an  estimate  of  the 
gradient  (17)  at  the  k-th  recursion.  This  estimate  is 
obtained  by  averaging  over  the  burst: 

«(fe)  =  jr  E{puHy(n)(lRe(z(fc)(n))r2 

n— 1  ^ 

R e^z^(ra)^  -  j|Im^z^(n)^  |p-2Im^z^(n)^|. 

(19) 

The  algorithm  is  initialized  with  v®  =  0.  Such  ini¬ 
tialization  is  equivalent  to  setting  the  equalizer  with 
R+P  (i.e.  the  particular  LS  solution  in  (14)).  Then 
w«  =  R+p  +  UvW. 

4.4.  Simulation  Results 

In  this  section,  we  shall  provide  some  simulation  re¬ 
sults  on  the  performance  of  the  proposed  semi-blind 
algorithm.  Three  users’  signals  ( K  =  3)  are  impinging 
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on  a  receiver  with  four  sensors  (M  =  4).  The  first  user 
is  the  desired  signal  and  the  other  2  users  are  interferers 
from  other  co-channel  cells.  We  shall  assume  that  the 
SNR  of  the  desired  signal  at  the  receiver  is  30  dB.  The 
signal-to-interference  ratio  (SIR)  is  3  dB  in  our  simula¬ 
tions.  The  signals  go  through  their  respective  channels 
which  are  modeled  as  3  taps.  This  is  the  case  when 
the  delay  spread  is  around  3  —  4  symbol  periods.  At 
the  receiver,  each  sensor  has  an  equalizer  of  length  6. 
Hence  the  spatial-temporal  equalizer  has  a  total  of  24 
coefficients. 

When  implementing  the  semi-blind  algorithm  (16), 
the  choice  of  the  exponent  p  has  to  be  determined.  Fig¬ 
ure  1  shows  a  plot  of  the  MSE  achieved  using  different 
p's  for  16-QAM  signals.  The  MSE  is  lower  when  a 
larger  p  is  used.  However,  a  compromise  has  to  be 
struck.  Using  too  large  a  p  might  have  numerical  prob¬ 
lems  in  the  recursion  at  the  initial  stage  when  the  noise 
and  ISI  is  severe  while  using  too  small  a  p  does  not  ap¬ 
proximate  (16)  well.  The  pure  blind  convex  algorithm 
in  [6]  uses  p  =  12.  We  shall  also  use  this  value  of 
p  in  subsequent  simulations.  The  step  size  p  for  the 
recursive  algorithm  is  0.001.  The  performance  mea¬ 
sure  is  the  mean  square  error  (MSE)  of  the  output. 
We  shall  compare  the  MSE  among  the  convex  semi¬ 
blind,  regularized  semi-blind  and  pure  LS  algorithms 
in  the  case  where  Nt  <  MNW.  The  blind  algorithm 
with  tap-anchoring  constraint  (12)  is  also  implemented 
using  a  recursion  similar  to  (18)  but  in  terms  of  w.  The 
blind  case  (which  does  not  take  into  account  of  known 
symbols  present  in  the  burst)  fails  to  converge  under 
this  scenario  for  both  4-QAM  and  16-QAM  (Fig.  (2) 
and  Fig.  (3)).  An  AGC  is  used  at  the  output  for  the 
convex  semi-blind  algorithm  so  that  the  comparison  is 
meaningful.  The  AGC  adjusts  the  gain  by 


where  an  is  the  alphabets  in  the  constellation  and 
|z(n)|2  is  the  average  over  the  burst.  The  term  E\an\2 
can  be  pre-computed  since  the  constellation  is  known. 
This  is,  in  fact,  the  variance  of  the  constellation  and  in 
our  simulations,  we  set  E|ara|2  =  1. 

Figure  (2)  shows  the  MSE  vs.  Nt  for  the  case  of 
4-QAM  signals.  The  MSE  is  that  of  the  desired  user. 
The  burst  has  150  symbols.  The  LS  curve  indicates  the 
MSE  if  we  are  only  using  the  training  sequence  to  com¬ 
pute  the  equalizer  coefficients.  It  is  also  an  indication 
on  the  MSE  before  passing  through  the  semi-blind  al¬ 
gorithms  since  we  initialize  the  algorithms  using  the  LS 
solution.  The  regularized  semi-blind  algorithm  is  im¬ 
plemented  as  in  [1].  Our  convex  semi-blind  algorithm 
runs  for  500  recursions.  The  MSE  plot  is  obtained  by 


averaging  over  40  runs  of  bursts  of  150  symbols.  The 
regularized  semi-blind  algorithm  achieves  smaller  MSE 
in  this  scenario  than  that  of  the  convex  semi-blind  al¬ 
gorithm. 

The  next  simulation  is  on  16-QAM  signals.  In  this 
case  the  MSE  vs.  Nt  plot  (Fig.  (3))  is  obtained  by  av¬ 
eraging  40  runs  of  bursts  of  200  symbols.  The  convex 
semi-blind  algorithm  iterates  500  times.  We  can  see 
that  in  this  scenario,  it  has  a  smaller  MSE  starting 
from  Nt  =  12  than  the  regularized  semi-blind  algo¬ 
rithm.  The  latter  method  does  not  perform  as  good 
as  in  the  case  of  4-QAM  signals.  If  we  can  tolerate  an 
MSE  of  no  more  than,  say,  0.05,  then  the  regularized 
semi-blind  method  will  fail  in  this  case  while  the  convex 
semi-blind  method  is  suitable  for  Nt  >  16  in  a  burst. 


5.  CONCLUSIONS 

In  this  paper,  a  convex  cost  with  training  constraint 
is  proposed  for  semi-blind  adjustment  of  the  coeffi¬ 
cients  of  a  spatial-temporal  equalizer  in  general.  Com¬ 
pared  to  other  blind  and  semi-blind  methods  in  a  short 
burst  communication  scenario,  the  proposed  method 
performs  better  especially  with  non-constant  modulus 
signal  constellations.  Such  type  of  constellation  is  pro¬ 
posed  in  the  3rd  generation  wireless  standard  when 
higher  data  rates  are  needed. 


Figure  1:  Plot  of  MSE  vs.  Nt  for  the  semi-blind  convex 
algorithm  using  different  p  (K  =  3,  16-QAM  signals, 
N  =  200). 
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Abstract —  Large  quadrature  amplitude  modulation  (QAM) 
constellations  are  currently  used  in  throughput  efficient  high 
speed  communication  applications  such  as  digital  TV.  For  such 
large  signal  constellations,  carrier  phase  synchronization  is  a 
crucial  problem  because  for  efficiency  reasons  the  carrier  ac¬ 
quisition  must  often  be  performed  blindly,  without  the  use  of 
training  or  pilot  sequences.  The  goal  of  the  present  paper  is 
to  provide  thorough  performance  analysis  of  the  blind  carrier 
phase  estimators  that  have  been  proposed  in  the  literature 
and  to  assess  their  relative  merits. 


I.  Introduction 

Fast  acquisition  of  the  carrier  phase  is  a  crucial  issue  in 
high-speed  communication  systems  that  employ  large  QAM 
modulation  schemes.  One  of  the  challenges  associated  with 
large  QAM  constellations  is  the  blind  carrier  acquisition, 
which  is  often  required  in  large  and  heavily  loaded  multipoint 
networks  for  bandwidth  efficiency  and  little  effort  involved  in 
network  monitoring.  It  is  known  that  for  large  QAM  constel¬ 
lations,  the  conventional  carrier  tracking  schemes  frequently 
fail  to  converge  and  result  in  “spinning”  [8],  [10].  There¬ 
fore,  developing  computationally  simple  blind  carrier  phase 
estimators  with  guaranteed  convergence  and  good  statistical 
properties  is  well-motivated. 

Recently,  a  number  of  blind  carrier  phase  estimators  have 
been  proposed  [1],  [2],  [3],  [4],  [6],  [11,  p.  266-277],  [12],  but 
thorough  performance  analysis  of  all  these  algorithms  has 
not  been  performed.  In  order  to  quantify  the  performance  of 
these  estimators,  the  large  sample  (asymptotic)  performance 
analysis  of  these  phase  estimators  will  be  established  and 
compared  with  the  stochastic  (modified)  Cramer-Rao  bound 
[11,  Section  2.4],  It  is  shown  that  the  seemingly  different 
estimators  [1],  [2],  [3],  [5],  [11,  p.  266-277],  [12],  are  the  same, 
while  the  estimator  proposed  in  [4]  has  a  larger  asymptotic 
variance  than  the  power-law  estimator  [3],  [6],  [12],  It  is 
also  shown  that  by  exploiting  the  additional  samples  acquired 
through  oversampling  the  received  continuous-time  waveform 
does  not  improve  the  performance  of  the  power-law  estimator 
in  [3],  [6],  [12].  Finally,  computer  simulations  are  presented 
to  corroborate  the  theoretical  developments  and  to  compare 
the  performance  of  the  investigated  phase  estimators. 

II.  Problem  Statement 

We  consider  the  baseband  QAM  communication  system 
where  the  received  signal  Y(n)  =  Yr(n)  +  jYi(n)  is  given  by 

Y{n)  =  ei0X{n)  +  N{n)  ,  (1) 


where  Yr{n)  and  Yj(n)  denote  the  in-phase  and  quadrature 
components  of  Y(n),  X(n)  stands  for  the  independent  and 
identically  distributed  (i.i.d.)  input  QAM  symbol  stream, 
N(n)  is  the  circularly  distributed  Gaussian  noise,  assumed  to 
be  independent  of  X(n),  and  0  denotes  the  unknown  carrier 
phase  offset.  The  problem  of  blind  carrier  phase  estimation 
consists  of  recovering  the  phase  error  6  only  from  knowledge 
of  the  received  data  Y(n).  Because  the  input  QAM  con¬ 
stellation  has  quadrant  (n/2)  symmetry,  it  follows  that  it 
is  possible  to  recover  the  unknown  phase  9  only  modulo  a 
tt/2— phase  ambiguity.  This  ambiguity  can  be  further  elimi¬ 
nated  through  the  use  of  appropriate  coding  schemes.  There¬ 
fore,  without  any  loss  of  generality,  we  can  assume  that  the 
unknown  phase  9  lies  the  interval  (— 7r/4,  7t/4).  In  the  next 
section,  we  briefly  outline  the  blind  phase  estimators  [1],  [2], 
[3],  [4],  [5],  [11,  p.  266-277],  [12],  and  establish  their  exact 
large  sample  performance. 

III.  Blind  Carrier  Phase  Estimators 

A.  Approximate  Maximum  Likelihood  Estimator:  Fourth- 
Power  Estimator 

The  maximum  likelihood  (ML)  estimator  of  9  can  be  theo¬ 
retically  derived  by  maximizing  a  stochastic  likelihood  func¬ 
tion,  obtained  by  averaging  the  conditional  probability  den¬ 
sity  function  of  the  received  data  with  respect  to  the  unknown 
data  stream  X(n).  However,  for  high  order  QAM  constella¬ 
tions,  the  computational  complexity  involved  in  calculating 
the  likelihood  function  and  more  importantly  the  resulting 
nonlinear  optimization  problem  render  the  ML-estimator  im¬ 
practical  for  most  high-speed  applications.  The  need  for  com¬ 
putationally  simple  estimators  with  guaranteed  convergence 
calls  for  alternative  (possibly  suboptimal,  but  computation¬ 
ally  feasible)  phase  estimators. 

Moeneclaey  and  de  Jonghe  have  shown  in  [12]  that  for 
any  arbitrary  2-dimensional  rotationally  symmetric  constel¬ 
lations  (such  as  square  or  cross  QAM  constellations)  the 
fourth-power  (or  power-law)  estimator  can  be  obtained  as 
an  approximate  ML-estimator  in  the  limit  of  small  Signal- 
to-Noise  Ratio  (SNR:=  101ogE|X(n)|2/E|AT(n)|2,  where  := 
stands  for  “is  defined  as”).  The  power-law  estimator  and  its 
sampled  version  are  defined  as: 


9 

0 


[(EX*\n))  EYA{n)]  , 

E(X*Hn))^n=1J  (n) 


(2) 

(3) 
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where  the  superscript  *  stands  for  complex  conjugation  and 
the  operator  E(-)  denotes  the  expectation  operator.  The 
fourth-power  estimator  does  not  require  any  complex  nonlin¬ 
ear  optimizations,  but  it  requires  a-priori  knowledge  of  the 
input  constellation  E(X*4(n)).  However,  this  is  not  a  restric¬ 
tive  assumption  since  for  most  QAM  constellations,  EX*4(n ) 
is  a  negative  real-valued  number,  whose  effect  can  be  easily 
accounted  for.  Using  standard  convergence  results  [9]  it  can 
be  checked  that  asymptotically  (3)  is1  w.p.  1  a  consistent 
estimator  (0  — *  9  as  N  — >  oo)  for  any  SNR  range.  An  expla¬ 
nation  can  be  obtained  by  observing  that,  in  the  presence  of 
circularly  and  normally  distributed  noise  N(n),  the  following 
relation  holds: 

jf  E  ^  EYAW  =  ei4eEX4(n)  ,  (4) 

n=l 

where  the  second  equality  in  (4)  is  obtained  by  expanding 
EY4(n)  =  E(exp(j9)X(n)  +  N(n))4,  taking  into  account  the 
independence  between  X(n)  and  N(n),  and  ENk(n)  =  0,  for 
any  positive  integer  k.  Hence,  (3)  recovers  the  carrier  phase 
from  the  phase  of  the  fourth-order  moment  of  the  received 
data. 

Cartwright  has  proposed  estimating  the  unknown  phase  6 
using  a  different  set  of  fourth-order  statistics  [3],  Define  the 
following  fourth-order  moments  and  cumulants: 

7  :=  E[Yr4}  +  E{Y4}  -  6E[Yr2Y?\  ,  (5) 

7a  :=  cum(Yr,  Yr,  Yr,  Yt)  =  E[Y?Yt }  -  3 E[Yr2]E[YrYi\ 

=  E\Y?Yi\  ,  (6) 

76  :=  cum(Yr,  Yt,  Yu  Yt)  =  £[yPlf]  -  3E[Y?]E[YrYi] 

=  E[YrYi3]  ,  (E[YrYi]  =  0).  (7) 

Cartwright’s  estimator  is  defined  by: 


phase  estimator  with  guaranteed  convergence  has  been  pro¬ 
posed  in  [2]  for  square-QAM  constellations.  Herein,  the  car¬ 
rier  acquisition  problem  is  reduced  to  the  blind  source  sep¬ 
aration  problem  of  the  linear  mixture  of  the  in-phase  and 
quadrature-phase  components  of  the  received  signal,  and  a 
cumulant-based  source  separation  criterion  is  proposed  to  es¬ 
timate  the  unknown  phase-offset  [2],  In  [1],  [11,  pp.  271-277], 
a  low  SNR  approximation  of  the  likelihood  function,  assum¬ 
ing  PSK  input  constellations,  is  shown  to  have  the  same  form 
as  the  estimator  [2].  Furthermore,  it  is  justified  that  this  es¬ 
timator  can  be  used  even  for  general  QAM  constellations  [11, 
pp.  271-277].  By  relying  on  Godard’s  quartic  criterion  [8], 
Foschini  has  shown  an  alternative  derivation  of  this  phase 
estimator  in  [5].  Next,  we  describe  briefly  the  estimator  pro¬ 
posed  in  [2] ,  which  relies  on  the  observation  that  the  in-phase 
and  quadrature  components  of  a  square-QAM  constellation 
are  independent. 

Let  cf>  denote  an  estimate  of  the  unknown  phase  offset  6 , 
define  the  “rotated”  output  Y(n)  :=  exp  (—j<j))  Y(n),  and 
assume  that  X(n)  belongs  to  a  square-QAM  constellation. 
In  the  absence  of  noise  and  if  <j>  =  0,  then  the  in-phase 
and  quadrature  components  of  Y(n)  =  X(n)  are  indepen¬ 
dent.  Thus,  the  joint  cumulants  of  the  in-phase  ( [Yr(n ))  and 
quadrature  ( Yi(n ))  components  of  Y (n)  are  equal  to  zero 

7a  :=  cum(Yr(rc),  Yr(n),  Yr(n),  Yi(n))  =  0  , 

76  :=  cum(Yr(n),Yi(n),Yi(n),Yi(n))  =  0  ,  (10) 

and2  7a  —  76  =  0.  It  is  interesting  to  remark  that  (10) 
continues  to  hold  true  even  in  the  presence  of  additive  cir¬ 
cularly  and  normally  distributed  noise  N(n),  because  the 
cumulants  of  the  in-phase  and  quadrature  components  of 
N(n)  cancel  out.  By  taking  into  account  (9),  it  follows  that 
7a  —76  =  (EY4 (n)  —  EY*4 (n)) /8j .  Thus,  9  can  be  estimated 
from: 


tan(40)  =  4  ^  1,1  ^ 


0  =  -at  an 
4 


.  (8) 


To  verify  that  Cartwright’s  estimator  is  the  fourth-power  es¬ 
timator  in  (2),  we  equate  the  in-phase  and  quadrature  com¬ 
ponents  of: 


EY4(n)=ej4eEX4(n)  =  cos  (4 0)EX4{n)+j  sin  (49)EX4(n) 
EY4(n)=E(Yr(n)  +  jYi(n))4=E\Y4(n)  +  Y4(n)  -  6 Y?(n) 
xY)2(n)]  +  4 jE[Yr3(n)Yi(n)  -  Yr(n)lf  (n)] 

=7  +  4.7'(7a  -76)-  (9) 

It  follows  that  7  =  cos  (49)EX4(n)  and  4(7a  —  76 )  = 
sin  (48)EX4(n),  which  implies  the  equivalence  between  es¬ 
timators  (2)  and  (8).  Cartwright’s  (fourth-power)  estimator 
requires  only  that  EX4(n)  7^  0  and  the  independence  be¬ 
tween  X(n )  and  additive  circularly  and  normally  distributed 
noise  N(n),  and  it  can  be  applied  to  both  square  and  cross- 
QAM  constellations,  as  opposed  to  the  estimator  proposed  in 
[4],  which  can  be  applied  only  to  square-QAM  constellations. 

It  is  interesting  to  remark  that  three  other  phase  estima¬ 
tors,  derived  using  completely  different  arguments,  are  equiv¬ 
alent  to  the  fourth-power  estimator.  An  alternative  robust 


0a  :=  arg  min ^(EY4{n)  -  EY*4(n)) 

=  arg  mm^e-^EY4^)  -  ej4<tl EYt4{n)).  (11) 

If  we  consider  the  polar  representation  EY4(n)  = 
X4  exp(j40),  from  (11)  we  obtain  that  0a  =  arg  min ^  A4 (exp 
{—j4{4>  —  9))  —  exp  {j4(<J>  —  9 ))),  which  implies  that  9a  =  9 
modulo  a  7r/4-phase  ambiguity.  Hence,  estimator  (11)  is  the 
same  as  the  fourth-power  estimator  (2).  By  taking  advan¬ 
tage  of  the  sign  of  7  :=  (EY4(n)+EY*4(n))/ 2  (see  (5),  (9)), 
the  7r/4-phase  ambiguity  inherent  in  (11)  can  be  reduced  to 
a  7r/2-phase  ambiguity  (since  if  0a  —  0  =  rr/4  modulo  tt/2, 
then  7  =  —EX4{n)  #  EX4(n)). 

In  practice,  many  communication  systems  utilizing  QAM 
constellations  employ  also  coding,  which  implies  that  the 
SNR  available  at  the  synchronizer  will  be  reduced  by  an 
amount  proportional  to  the  coding  gain.  In  order  to  eval¬ 
uate  correctly  the  performance  of  these  phase  estimators  at 
all  SNR  levels,  next  we  provide  an  exact  expression  for  the 
large  sample  variance  of  the  power-law  estimator,  which  is 
valid  for  any  SNR  level  and  it  is  not  restricted  to  the  high 
SNR  regime  as  is  the  case  with  the  approximate  asymptotic 
expression  presented  in  [12],  The  next  section  will  show  that 


^The  notation  w.p. 


1  denotes  convergence  with  probability  one. 


2 


The  reader  can  easily  check  that  7a  =  —75,  [4]. 
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(17) 


the  expression  of  [12]  is  not  valid  for  low  and  medium  SNRs 
(<  20  dB). 

Theorem  1.  Assuming  that  the  i.i.d.  symbol  stream  X(n) 
is  coming  from  a  finite  dimensional  QAM- constellation  and 
that  the  additive  noise  N(n)  is  circularly  and  normally  dis¬ 
tributed  and  independent  of  X(n),  then  the  estimate  (3)  is 
asymptotically  normally  distributed  with  zero  mean  and  the 
asymptotic  variance: 


lim  JV(0-0)2  = 

N—*oo 


My  44  —  EX8(n) 
32{EXi(n))2  ’ 


(12) 


with3  py,4o  ~  EY\n)  =  ej4°  EX4  (n),  and 

pYAr.=E\X{n)\8+16E\X{n)\6E\N(n)\2+36E\X{n)\4 

x£|lV(ra)|4+161J|X(n)|2.E|lV(n)|6+-E|./V(ri)|®  (13) 


Proof.  Please  see  [13].  □ 

The  asymptotic  variance  (12)  does  not  depend  on  the  un¬ 
known  phase  6 ,  but  only  on  the  input  symbol  constellation 
and  the  SNR.  This  confirms  the  conclusion  drawn  in  [3]  stat¬ 
ing  that  the  standard  deviation  of  (8)  appears  to  be  constant 
with  respect  to  the  true  value  of  6.  We  evaluate  next  the 
asymptotic  performance  of  a  phase  estimator  based  on  an 
alternative  set  of  statistics  that  was  proposed  in  [4]. 

B.  HOS-Based  Phase  Estimator  of  [6] 

The  phase  estimator  [4]  extracts  the  unknown  phase  infor¬ 
mation  0  e  (— 7t/4,  7r/4)  using  the  relations: 


(14) 


(15) 

with  :=  E[ \X\4}  -  2{E\X\2}2  and 

7  :=  cum{yr(n),  Yr(n),  Yi(n),  Yi(n)}  =  E[Y2  {n)Y2  (n)} 

-  E[Y2(n)\E[Y2(n)\  =  0.25  sin2  (20)7*.  (16) 

Let  7 a,  Jb,  and  7  denote  sample  estimates  for  ja,  76,  and  7, 
respectively,  and  define  by  0i  and  02  the  sample  estimates 
corresponding  to  (14)  and  (15),  respectively.  The  next  theo¬ 
rem,  whose  proof  is  deferred  due  to  space  limitations  to  [13], 
establishes  the  asymptotic  performance  of  0i  and  02. 
Theorem  2.  Assuming  that  the  i.i.d.  symbol  stream  X{n) 
is  coming  from  a  finite  dimensional  QAM-constellation  and 
that  the  additive  noise  N(n)  is  circularly  and  normally  dis¬ 
tributed  and  independent  of  X  ( n ) ,  then  the  estimates  6 1  and 
02  are  asymptotically  normally  distributed  with  zero  mean  and 
asymptotic  variances: 


cot(20)  = 


7a  ~7t 
27 


if 


2L 

7* 


>  0.125 


'«(-?•-§  Hf-S)- 


tan (20)  =  2(7?  if 


0  € 


7*  -47 

/  7T  7T\ 

V  8 ’ 8/ ’ 


-E  <0.125^ 

7® 


lim  1 V(0t  -  0)2  =  +  cot2(2*)^  Z2cot(2g)^  , 

N-*  00  7* 

3  The  notation  Hy,kl  :=  EYk(n)Y *l  (n)  stands  for  the  ( k  +  l)th-moTnent 
of  V'(n). 


7T  7T\ 

.8*4  /  ’ 


lim  r.r(6<  0 )2  gU  +  4tan2(26l)g22  +  4tan(20)gi2 

N—*oo  '  2  '  7I 


7 r  7T\ 
8’  8/  ’ 


(18) 


where: 


Qu  ■=  lim  NE[(%  -  76 )  -  (ja  -  7b)]2  = 

N—+OO  OZ 


+ 


cos(80)[(EX4(n))2  -  £X8(n)]  +  pv,4i 


32 


(19) 


qi2:=  lim  AT£{(7-7)[(7a -76)  -  (7a -75)]} 

N—*oo 

-  sin  (8 9)[EX8(n)  -  2(£X4(n))2]  +  2Im{^y,62} 

64 

4 sin  (40)EX4{n)[pY,22  -  y,u] 

64 

_  8(E|X(n)|2  +  g[lV(n)|a)Im{Aty,Bi} 

64 

.•  „n,.  \2  cos  (89)EX8(n)  +  3/ry44 

Q22  :=  lim  NE{~1  -  7)  =  - — - - — 

N  — >oo  1  Zo 

4Re{^y62}  +  48/iy,n  +  6 [cos  (4 9)EX4(n)  -  My22] 
128 

32/iyn  [cos  (40)EX4(n)  -2E\Y(n)\4\ 


(20) 


128 

16[Re{/ry,5i}  —  py,33]py,ii 
128 


(21) 


My 4 4  is  given  by  (13),  and 


Py,62  :=ej4e[EX6(n)X*2(n)  +  12EX5(n)Xm{n)E\N(n)\2 


+  15EX4(n)E\N(n)\4},  (22) 

My  51  :=ej4$[EX*(n)X'(n)  +  5EX4(n)E\N(n)\2],  (23) 

My, 33  ~E\X{n)\6  +  9E\X(n)\4E\N(n)\2 

+  9E\X{n)\2E\N(n)\4  +  £7|AT(rz)|6,  (24) 

My 22  :=  E\X{n)\4  +  4E\X(n)\2E\N(n)\2  +  £|lV(n)|4,  (25) 

Myn  E\X(n)\2  +  E\N(n)\2.  (26) 


Opposed  to  the  power-law  estimator,  the  asymptotic  per¬ 
formance  of  the  Chen  etal.  estimator  [4]  depends  on  the 
phase  offset  0.  As  the  simulation  results  will  show  (see  Fig¬ 
ure  5),  the  asymptotic  performance  of  this  estimator  deteri¬ 
orates  significantly  whenever  the  a-priori  intervals  (14),  (15) 
are  missed,  and  for  any  SNR  it  exhibits  a  larger  variance  than 
the  power-law  estimator. 


IV.  Performance  Comparisons 

In  this  section,  computer  simulations  are  performed  to 
assess  the  relative  merits  of  the  proposed  phase  estima¬ 
tors  by  comparing  the  theoretical  (asymptotic)  limits  and 
the  experimental  standard  deviations  of  the  investigated  es¬ 
timators.  Two  additional  estimators  have  been  analyzed: 
the  fractionally-sampled  (FS)  power-law  estimator  and  the 
reduced-constellation  power  estimator.  The  FS-power  es¬ 
timator  recovers  the  unknown  phase  offset  0  by  exploiting 
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all  the  samples  obtained  by  fractionally-sampling  (oversam¬ 
pling)  the  received  continuous-time  waveform  in  the  estima¬ 
tor  (3).  A  raised-cosine  pulse  shape  with  roll-off  factor  0.3 
and  an  oversampling  factor  P  =  3  are  assumed  throughout 
the  simulations.  The  reduced-constellation  power  estimator 
relies  also  on  (3),  but  only  the  received  samples  that  are 
larger  in  magnitude  than  a  given  threshold  are  processed  [10, 
p.  1382],  [6,  p.  1482].  Thus,  only  the  points  closest  to  the 
four  corners  of  the  constellation  are  processed.  The  asymp¬ 
totic  performance  of  these  two  additional  estimators  can  be 
established  using  the  result  of  Theorem  1,  but  due  to  space 
limitations  their  expressions  will  not  be  presented. 

In  Figures  1-a  and  b,  we  have  plotted  the  experimen¬ 
tal  and  theoretical  standard  deviations  of  all  these  estima¬ 
tors  versus  SNR,  assuming  a  square  256-QAM  constellation, 
0  =  15°(=  7r/12),  N  =  512  samples,  MC  =  300  Monte-Carlo 
runs,  and  additive  normally  distributed  noise.  The  threshold 
in  the  reduced-constellation  power  estimator  has  been  set  up 
so  that  only  the  received  samples  corresponding  to  the  12 
points  of  the  input  256-QAM  constellation  with  the  largest 
radii  are  processed.  The  solid  line  denotes  the  stochastic 
Cramer- Rao  bound  (CRB=  1/(AT  •  SNR))  corresponding  to 
the  phase  estimate.  Figure  1  shows  that  the  power-law  es¬ 
timator  performs  better  than  the  Chen  etal.  estimator  [4] 
at  all  SNR  levels,  but  worse  than  the  reduced-constellation 
power  estimator  at  high  SNRs  (SNR>  20  dB).  The  FS-based 
power  estimator  appears  to  have  the  worst  performance.  The 
reduced  performance  of  the  FS-power  estimator  is  due  to  the 
increased  “self-noise”  generated  by  the  residual  intersymbol 
interference  effects.  For  this  reason,  we  have  not  pursued 
further  the  analysis  of  FS-based  power-law  estimators. 

In  Figure  2,  we  have  plotted  separately  the  theoretical 
and  experimental  standard  deviations  of  the  power-law,  the 
reduced-constellation  power-law,  and  the  Chen  etal.  (15)  es¬ 
timators,  assuming  MC  =  300  Monte-Carlo  simulation  runs, 
N  =  512  samples,  0  =  7r/12,  and  a  256-QAM  input  con¬ 
stellation.  The  experimental  values  are  well  predicted  by  the 
asymptotic  limits  for  all  three  estimators,  but  the  CRB  seems 
to  be  a  loose  bound.  In  Figure  3,  the  experimental  and  the¬ 
oretical  standard  deviations  of  the  power-law  and  the  Chen 
etal.  estimators  are  plotted  versus  the  number  of  samples 
(N),  assuming  SNR=  10  dB,  MC  =  300  Monte-Carlo  runs, 
0  =  7r/12.  It  turns  out  that  both  estimators  achieve  the 
asymptotic  bound  even  when  a  reduced  number  of  samples 
N  =  250  -r  500  are  used. 

In  Figure  4-a,  the  asymptotic  performance  of  the  Chen 
etal.  estimator  (14)  is  analyzed,  assuming  0  =  7r/5,  MC  = 
300,  and  N  =  512.  Figures  4-b  and  5  show  that  the  per¬ 
formance  of  the  Chen  etal.  estimator  depends  on  the  un¬ 
known  phase  0  and  has  a  larger  standard  deviation  than  the 
power-law  estimator  for  any  phase  offset  0  (Figure  5)  and 
for  any  SNR- level  (Figure  4-b).  In  Figure  5,  the  theoretical 
standard  deviations  (17)  and  (18)  are  plotted  on  the  inter¬ 
val  (— 7t/4,  7t/4)  assuming  perfect  a-priori  knowledge  of  the 
intervals  (14),  (15)  where  0  lies.  However,  in  the  presence  of 
a  wrong  a-priori  knowledge  on  0  (|0|  >  7r/4)  the  performance 
of  estimator  [4]  deteriorates  significantly. 

In  Figures  6  and  7,  we  have  analyzed  the  performance  of 
the  power-law  and  the  reduced-constellation  power-law  esti¬ 
mators  in  the  case  of  a  cross  128-QAM  constellation,  assum¬ 


ing  0  =  7t/12,  MC  =  300,  N  —  4000  samples.  For  such 
constellations,  the  Chen  etal.  estimator  cannot  be  used  since 
the  in-phase  and  quadrature  components  of  the  input  symbol 
stream  are  not  independent.  In  Figures  6  and  7-a,  the  ex¬ 
perimental  and  asymptotic  standard  deviations  of  the  power- 
law  and  the  reduced-constellation  power-law  estimators  are 
plotted  for  different  SNR  levels.  Figures  7-a,b  show  that  the 
asymptotic  limit  predicts  well  the  experimental  results  for  all 
SNR-levels  and  number  of  samples  N  >  1000.  It  appears  also 
that  for  cross-QAM  constellations,  the  power-law  estimator 
exhibits  very  slow  convergence  rate  and  good  estimates  of  the 
phase-offset  can  be  obtained  only  by  using  a  large  number  of 
samples  ( N  >  5,000).  Finally,  Figure  8  reveals  that  the  ap¬ 
proximate  asymptotic  limit  derived  in  [12]  does  not  predict 
well  the  exact  asymptotic  limit  of  the  power-law  estimator 
for  small  and  medium  SNRs  (SNR<  20dB). 
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Fig.  1.  Standard  Deviation  vs.  SNR  a)  Experimental  Values  b) 
Asymptotic  Values  (256  square-QAM) 
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Fig.  2.  Standard  Deviation  vs.  SNR:  Experimental/Theoretical  Val¬ 
ues  a)  Power  Estimator  b)  Reduced-Constellation  Power  Estimator 
c)  Chen  etal.  Estimator  (256  square-QAM) 


Fig.  5.  Standard  Deviation  vs.  Phase  offset:  Asymptotic  Limit  (256 
square-QAM) 


Fig.  6.  Standard  Deviation  vs.  SNR  a)  Power  Estimator  b)  Reduced- 
Constellation  Power  Estimator  (128  cross-QAM) 


Fig.  3.  Standard  Deviation  vs.  No.  of  Samples:  Power  Estimator  vs. 

Chen  etal.  Estimator  (256  square-QAM)  Fig.  7.  Standard  Deviation  vs.  SNR/Data:  a)  Reduced-Constellation 

Power-Law  and  Power-Law  Estimators  b)  Power  Estimator  (128 
cross-QAM) 


Fig.  4.  Standard  Deviation  vs.  SNR  a)  Chen  etal.  Estimator  (0  = 
7t/5)  b)  Asymptotic  Limits  (256  square-QAM) 


Fig.  8.  Standard  Deviation  vs.  SNR:  Exact  and  Approximate  Asymp¬ 
totic  Limits  (256  square-QAM) 
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ABSTRACT 

This  paper  addresses  the  problem  of  time-invarying 
(TIV)  bilinear  system  identification.  The  input-output 
relation  of  a  TIV  bilinear  system  is  expressed  as  a  time- 
varying  recursive  equation.  Such  formulation  allows  us 
to  estimate  the  unknown  bilinear  system  parameters 
using  a  modified  least-squares  (MLS)  algorithm.  The 
MLS  method  provides  unbiased  estimates  of  the  un¬ 
known  bilinear  parameters.  Several  simulations  illus¬ 
trate  the  MLS  estimator  performance. 

1.  INTRODUCTION 

Linear  models  have  found  a  variety  of  applications  in 
many  areas  such  as  speech  processing,  image  process¬ 
ing  and  communications.  These  models  include  para¬ 
metric  Autoregressive  (AR),  Moving  Average  (MA)  or 
Autoregressive  Moving  Average  (ARMA)  models.  The 
use  of  these  parametric  models  can  be  motivated  by 
the  following  property:  for  any  real-valued  stationary 
process  y  ( n )  with  continuous  spectral  density  S  (/),  it 
is  possible  to  find  an  ARMA  process  whose  spectral 
density  is  arbitrarily  close  to  S  (/)  ([2],  p.  130).  How¬ 
ever,  these  models  fail  to  identify  many  systems  which 
are  inherently  nonlinear. 

Bilinear  model  has  been  used  successfully  to  approx¬ 
imate  a  large  class  of  nonlinear  systems  [5]  [7],  Its  abil¬ 
ity  to  represent  many  nonlinearities  efficiently  and  with 
a  relatively  small  number  of  parameters  is  owing  to  its 
feedback  structure  [5].  Other  properties  motivating  the 
use  of  bilinear  systems  are  also  discussed  in  [4],  The 
problem  of  estimating  bilinear  system  parameters  using 
measurements  of  the  system  input  and  output  signals 
has  received  much  attention  in  the  literature  [3]  [6].  Re¬ 
cursive  estimation  algorithms  including  the  recursive 
least  squares  algorithm  (RLS)  or  the  extended  least 
squares  algorithm  (ELS)  have  been  studied  in  [3].  The 
main  advantage  of  the  RLS  algorithm  is  its  simplicity 
because  of  the  linearity  in  the  parameters.  However, 
the  algorithm  provides  biased  estimates.  Simulations 
presented  in  [3]  have  shown  that  the  ELS  algorithm 


outperforms  the  RLS  algorithm  in  terms  of  bias.  How¬ 
ever,  no  theoretical  study  was  provided  because  of  the 
non-linear  estimation  problem  and  the  difficult  com¬ 
putation  required.  Hence  various  methods  have  been 
devised  to  obtain  unbiased  estimators  from  linear  esti¬ 
mation  problems.  Some  of  these  methods  are  based  on 
modifying  the  least  squares  estimator  by  substracting 
the  bias  from  the  estimates  [8].  This  paper  studies  the 
modified  least  squares  (MLS)  algorithm  for  the  identi¬ 
fication  of  bilinear  systems.  The  MLS  algorithm  yields 
unbiased  parameter  estimates  and  lower  computational 
cost  than  the  ELS  algorithm. 

The  paper  is  organized  as  follows.  Section  II  presents 
the  problem.  Section  III  studies  the  recursive  MLS  al¬ 
gorithm  for  the  bilinear  system  identification  problem. 
Simulation  results  and  conclusions  are  reported  in  sec¬ 
tion  IV. 

2.  PROBLEM  FORMULATION 

The  output  x(t)  of  a  bilinear  system  driven  by  the  input 
sequence  u(t)  can  be  defined  by  the  following  recursive 
equation  : 

p  p 

x(t)  —  ajx(t  —  i)  +  bju(t  —  i) 

7=1  7=1 

P  V 

+  J2  -  j)x(t  -  i)  (1) 

7=1  j~  1 

where  aj,6,,Cij  are  the  unknown  bilinear  system  pa¬ 
rameters  and  t  =  1, ...,  N.  A  noisy  version  of  x(t)  de¬ 
noted 

y(t)  =  x(t)  +  e(t)  (2) 

is  observed  (see  fig.  1).  In  eq.  (2),  e{t)  is  a  stationary 
white  Gaussian  noise  with  zero  mean  and  variance 

E[e{t)e(s)\  =  a26t,s 

where  6t,s  is  the  kronecker  symbol.  Eq.’s  (1)  and  (2) 
show  that  the  observed  process  y(t)  satisfies  the  follow- 
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ing  time- varying  (TV)  model : 

p 

y(t )  =  ao(t)  +  -  0  +  eW  (3) 

i— 1 

where  the  TV  parameters  are 


p 


a0(t) 

—  y^bju(t-i) 
i— 1 

(4) 

&i(t) 

V 

=  0>i  “h  ^  ^  j) J  ^ 

3= 1 

...,p 

In  eq.  (3),  e(t )  is  a  colored  noise  sequence  defined  by: 

p 

e(t)  =  e(t)  -  ^2,ai{t)e(t  -  i),t  =  1, N 
i= 1 

Model  (3)  is  similar  to  the  TV  ARMA  model  studied  in 
[1]  for  the  identification  of  non-stationary  signals  em¬ 
bedded  in  noise.  Indeed,  dj(f)  can  be  viewed  as  a  linear 
combination  of  functions  fj(t)  as  follows: 


p 


Cli(t)  —  ^  i  —  0,  ...,p 

(5) 

j= 0 

«00 

=  0,  aoj  =  bj,  j  1,  ...,p 

(6) 

®i0 

—  —  Cij ,  j  1 ,  ■  •  -P 

fo(t) 

=  1  ,fj(t)  =  u(t-j),j  =  l  =  l,. 

..,p 

Eq.  (5)  is  similar  to  the  decomposition  of  the  time- 
varying  AR  parameters  onto  a  set  of  basis  time  func¬ 
tions  studied  in  [1].  This  paper  proposes  to  estimate 
the  unknown  bilinear  system  parameters  from  the  in¬ 
put  and  output  samples  u(t)  and  y{t)  for  t  =  l, N 
using  the  modified  least  squares  (MLS)  algorithm  [1], 
[8], 

3.  LEAST-SQUARES  ESTIMATORS 

Denote  dT  =  (bT,dJ)  the  bilinear  system  parameter 
vector  with  bT  =  (b\, ... ,  bp)  and 

9 1  —  (al !  CU,  C  1,2)  •  •  •  )  Cl  }p,  Cl2)  c2,l)  ■  •  ■  )  api  ■  •  •  1  Cp,p) 

(7) 

Eq.  (3)  can  be  written  in  matrix  form  as  follows: 

y(t)  =yf_10  +  e(t),  t  =  l,...,N  (8) 

where 

y£-i  =  (u(t-l),u(t-2),...,u(t-p), 

y(t  -  1),  y{t  -  l)u(f  -  1 ),...,  y(t-  1  )u(t  -  p), 
'  •  •  1 

y(t  -  p),y(t  -  p)u(t  - 1 ),...,  y{t-  p)u(t  -  p)) 


3.1.  The  Conventional  LS  Algorithm 

The  conventional  least  squares  (LS)  estimator  of  9  de¬ 
noted  6 isr,  is  defined  by 

9m  =argmin  J\{9)  (9) 

e 

where  Ji(9)  =  I Zt=ie2^)-  Since  Ji{9)  is  linear  w.r.t. 
9,  an  analytical  solution  for  9  can  be  derived: 

(N  \  -1  N 

5Zyt-i2/(i) 

t=i  /  t=i 

The  white  noise  sequence  e(t)  being  zero-mean  and 
decorrelated  with  x(t),  lim  On  can  be  expressed  as 

N— >oo 

a  function  of  the  true  parameter  vector  as  follows  : 


(10) 


where  0P|P  is  the  p  x  p  zero  matrix, 

Pn  =  feyt"iy£-i  j 

and 

Ut  =  (1,  u(t  -  1), . . . ,  u(t  -  p))T  (1,  u(t  -  1), . . . ,  u(t  -  p)) . 

Eq.  (10)  shows  that  the  LS  estimator  of  6  is  generally 
asymptotically  biased. 

3.2.  The  Extended  LS  Algorithm 

The  Extended  Least  Squares  (ELS)  algorithm  has 
shown  interesting  properties  for  pseudo-linear  regres¬ 
sion  models  such  as  (8)  [3].  This  algorithm  can  be 
summarized  as  follows: 

Qn  =  1  +  y^PNyN, 

Pn+ i  =  Pn  ~  RNyNQjvVw-fW, 

&N+1  =  9n  +  RiVyjvQjv1  (yw+l  —  yV^iv)) 

y(N  + 1)  =  yJfON+i, 

Vn+i  =  («(*0, . . .  ,y(N), . . .  ,y(N+l-p)u{N+l-p)). 

It  is  well  known  that  the  ELS  algorithm  provides  unbi- 
ased  estimates.  However,  it  suffers  from  stability  prob¬ 
lems  [3].  Next  section  studies  another  unbiased  estima¬ 
tor  known  as  Modified  Least  Squares  (MLS)  estimator. 
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3.3.  The  Modified  LS  Estimator 

The  MLS  estimator  also  denoted  bias-compensated 
least  squares  estimator  is  defined  as  follows  [8]: 

6n  =  On  +  cr2 PnVnOi  n-i  (11) 


The  MLS  estimator  defined  by  eq.  (11)  is  clearly 
asymptotically  unbiased.  However,  this  estimator  re¬ 
quires  to  compute  the  sum  of  N  matrices  of  size 
(p2+2p)  x  (p2+p).  In  order  to  avoid  such  computation, 
we  assume  in  the  following  that  the  input  sequence  u(t) 
is  a  sequence  of  mutually  independent  and  identically 
distributed  (i.i.d)  random  variables  with  zero-mean  and 
variance  <j2  =  1.  In  this  case,  the  following  result  can 
be  obtained: 


lim  —VN  = 

N—> co  N 


=  V  (13) 


The  following  biased  compensated  LS  estimator  can 
then  be  defined: 

On  —  On  +  c t2NPnVOi,n-i  (14) 

Eq.  (14)  explicitely  depends  on  the  noise  variance  a2. 
Next  section  studies  a  recursive  algorithm  for  the  joint 
estimation  of  cr2  and  6  as  in  [8]. 

4.  NOISE  VARIANCE  ESTIMATION  FOR 
THE  MLS  ALGORITHM 

Denote  £t(7V)  the  residual  at  time  N  and  Rn  the  sum 
of  residual  squares: 


UN) 


y(t)  -  yf-^N 


Rn  =  Y;&(N)  (16) 

t= 1 

Eq.  (8)  shows  that  the  residual  £t(N)  can  be  written 


UN)  =yf-i  V-ON)  +  e(t)-'Z_101 


where 


It  is  well  known  that  On  satisifes  the  normal  equations 
(obtained  by  differentiating  J\  (6)  with  respect  to  0) 
[8]: 

N 

5>-i6(j>0  =  o 

t= i 

Consequently 


A  )UN) 


hence 


^r-Rjv  =  a2  +  o2E  K]  V9i 


By  replacing  the  expectation  E  and  E  0N  by 

their  instantaneous  values,  an  estimator  of  the  noise 
variance  can  be  defined: 

c.2  _  1  Rn  ,10, 

aN  ~  AT  ~  (18) 

1  +  0  N  V0\tN—l 

The  MLS  algorithm  for  the  joint  estimation  of  the  noise 
variance  cr 2  and  the  bilinear  system  parameter  vector 
6  is  then  based  on  the  following  recursive  equations: 


ef-i  =  (e(t-l),e(t-l)u(t-l),...,e(t-l)u(t-p), 
e(t  -  p),  e(t  -  p)u(t  -  1 ),...,  e(f  -  p)u(t  -  p)) 


Qn  =  1  +  yw-Pivyjv,  (19a) 

Rn+i  =  Pn  ~  RjvyArQjv’y^-fV,  (19b) 

On+i  =  On  +  PNyNQj/iyN+i  -  ylfOu),  (19c) 

Rn+i  —  Rn  +  £n+i(N  +  l)^^1,  (19d) 

-2  _  1  Rn+i  \ 

',+'  ‘  ’ 

0N+1  =  0N+1  +  (N  +  l))Z2N+1PN+1V9h^m) 

Note  that  eq.’s  (19a),  (19b)  and  (19c)  are  the  classical 
RLS  equations  [3].  Eq.’s  (19d),  (19e)  and  (19f)  en¬ 
sure  that  the  bilinear  system  parameter  estimates  are 
asymptotically  unbiased.  It  is  interesting  to  note  that 
the  MLS  algorithm  does  not  require  any  matrix  inver- 


5.  SIMULATION  RESULTS 

Many  simulations  have  been  performed  to  illustrate  the 
previous  theoretical  results.  For  this  experiment,  con¬ 
sider  the  following  second-order  bilinear  system  [3] 

x(t)  =  1.5x(t  -  1)  -  0.7x(t  -  2)  +  u(t  -  1) 
+0.5u(t  —  2)  +  0.12a;(t  —  1  )u(t  -  1) 

The  observed  driving  sequence  u(t)  is  white  Gaussian 
with  variance  1.  The  bilinear  signal  x(t)  is  contami¬ 
nated  by  white  Gaussian  noise  with  signal-to-noise  ra¬ 
tios  (SNR’s)  ranging  from  5  to  40dB.  The  algorithm  is 
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initialized  with  0  =  0  and  .P/v  =  1/SI  where  6  «  1. 
Fig.  2  shows  the  convergence  of  the  noise  variance 
estimate  to  its  true  value  ( SNR  =  5  dB  or  equiva¬ 
lently  a2  =  7.28)  from  10  Monte-Carlo  simulations. 
The  mean  square  errors  (MSE’s)  of  the  bilinear  system 
estimates  using  RLS,  ELS  and  MLS  algorithms  com¬ 
puted  from  10  Monte-Carlo  simulations  are  depicted  in 
fig.  3  as  a  function  of  the  SNR  for  N  =  4000.  The 
MLS  estimator  clearly  outperforms  the  usual  RLS  es¬ 
timator  in  terms  of  MSE.  Fig.  3  also  shows  that  the 
MLS  estimator  outperforms  the  ELS  estimator  for  low 
SNR’s.  Tables  2  and  3  show  the  bias  of  RLS,  ELS  and 
MLS  estimates  for  two  values  of  SNR.  As  expected,  the 
MLS  estimator  outperforms  the  usual  RLS  estimator  in 
terms  of  bias.  The  MLS  and  ELS  algorithms  perform 
very  similarly  in  term  of  bias. 

6.  APPLICATION  :  NON  LINEAR 
SATELLITE  CHANNEL  IDENTIFICATION 

Several  non  linear  techniques  have  been  proposed  for 
modeling  non  linear  channels  with  memory.  These 
techniques  include  Volterra  series,  wavelet  networks 
and  neural  networks  [11].  The  use  of  Volterra  series  to 
model  satellite  channels  was  motivated  in  [9]  and  [10]. 
These  Volterra  models  suffer  from  the  number  of  pa¬ 
rameters  that  increases  exponentially  with  the  memory 
and  nonlinearity  order.  It  is  well  known  that  the  bilin¬ 
ear  model  can  be  decomposed  in  a  Volterra  series  with 
a  reduced  number  of  parameters  [4].  Consequently,  this 
paper  propose  1)  to  model  the  non  linear  satellite  chan¬ 
nel  using  the  bilinear  model  and  2)  to  identify  such  non 
linear  model  using  the  LS  procedures  described  in  pre¬ 
vious  sections.  A  simplified  satellite  channel  consists 
of  two  earth  stations  connected  by  a  satellite  repeater 
as  depicted  in  fig.  4  (see  [11]  for  more  details  including 
channel  characteristics).  As  an  example,  Fig.  5  shows 
the  normalized  prediction  error  between  the  outputs 
of  the  noisy  simplified  satellite  channel  and  the  corre¬ 
sponding  bilinear  system  computed  using  MLS  algo¬ 
rithm. 

7.  CONCLUSION 

The  new  contribution  of  this  paper  is  to  derive  a  mod¬ 
ified  least  squares  algorithm,  from  the  theory  of  lin¬ 
ear  time-varying  models  for  the  identification  of  time 
invarying  bilinear  models.  A  recursive  version  of  the 
modified  least  squares  algorithm  is  derived  as  well.  The 
algorithm  provides  estimates  of  the  noise  variance  and 
bilinear  model  parameters.  Bilinear  MLS  parameter  es¬ 
timates  are  shown  to  be  asymptotically  unbiased.  The 
MLS  estimator  performance  is  compared  to  that  of  the 


RLS  and  ELS  estimators.  The  MLS  estimator  is  finally 
applied  to  the  identification  of  the  non  linear  satellite 
channels. 
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ABSTRACT 

This  article  presents  a  method  to  blindly  identify  linear 
quadratic  channels  (LQC).  The  method  is  designed  for 
the  single-input/single-output  (SISO)  case  with  white 
inputs  with  specific  distributions  (as  those  usually 
found  in  digital  communications).  Using  High-Order 
Statistics  (HOS)  of  the  input,  the  method  is  able  to 
match  the  third-order  moments  with  the  LQC  model, 
yielding  an  original  simple  relation.  Several  simula¬ 
tions  are  performed  and  show  a  fair  accuracy  given 
sufficiently  long  observation  records. 

1.  INTRODUCTION 

Nonlinear  systems  provide  a  better  approximation  to 
real  life  channels,  and  many  examples  of  nonlinearities 
can  be  found  in  nonlinear  control  systems  [5] ,  hydrody¬ 
namics  [4],  satellite  communication  systems  [1],  or  un¬ 
derwater  acoustics,  among  others.  Blind  methods  are 
attractive  when  the  input  is  unknown,  and  to  avoid  the 
reduction  of  the  information  rate  caused  by  the  inser¬ 
tion  of  training  sequences. 

Blind  identification  of  Volterra  systems  has  been  al¬ 
ready  widely  studied  in  the  past.  For  instance,  in  [7], 
the  authors  derive  the  cumulant-matching  equations, 
allowing  to  blindly  identify  a  pure  real  quadratic  sys¬ 
tem,  with  i.i.d.  inputs  of  unknown  distribution.  Next 
in  [2],  P.Bondon  goes  much  further,  and  derives  identi- 
fiability  conditions,  when  two  input  sequences  are  ob¬ 
served,  one  Gaussian  and  one  non  Gaussian. 

In  this  paper,  we  focus  our  attention  on  linear- 
quadratic  systems,  with  specific  discrete  inputs,  en¬ 
countered  in  n— PSK  and  QAM  digital  modulations. 
So  this  contribution  differs  from  the  previous  ones  in 
two  respects:  the  system  is  not  purely  quadratic,  and 
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the  inputs  are  imposed  to  be  discrete  and  of  known 
distribution.  The  scope  is  thus  less  general. 

2.  MODEL  FORMULATION 

The  problem  is  modeled  here  by  the  parameterization 
of  the  channel  and  by  the  statistics  of  the  inputs. 

2.1.  Volterra  kernel  model 

The  model  is  described  by  the  noisy  output  of  a  nonlin¬ 
ear  system  moving  average  Volterra  model  (which  can 
be  of  any  order).  Sampling  at  a  rate  Ts  and  restrict¬ 
ing  to  the  Linear-Quadratic  case,  the  channel  can  be 
modeled  as: 

Li 

y(n)  -  ^2  (0  ®  (n  -  0  + v  M 
(=0 
l2 

+  h2  (*’•?)  *  (n  -  *)  x  (n  -  i)  (!) 

i,j~  0 

where  x(n)  is  the  input  signal,  v(n)  denotes  the  ad¬ 
ditive  noise,  and  h„  is  called  the  nt/l-order  Volterra 
non-linear  operator  (here,  we  only  have  the  linear  and 
the  quadratic  term:  and  ^2 Ui , ^2)) ■  Without  loss 

of  generality,  we  consider  that  hn  is  symmetric  in  its 
arguments  [6,  pp. 80-81]. 

2.2.  Usual  communication  inputs 

For  the  sake  of  convenience,  denote: 

eab  =  E  [xaXb*]  . 

In  this  article,  we  consider  inputs  commonly  used  in 
digital  communications,  sharing  the  high-order  proper¬ 
ties: 

£21  =  £31  =  £32  =  £41  =  £42  =  0  (2) 

Among  these  inputs,  two  groups  have  been  identified 
(see  [9]  and  [3]): 
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•  Distributions  that  are  symmetric  about  both  axes 
in  the  complex  plane:  p(z)  =  /(Sft{z})  •  </($>{z}). 
Corresponding  random  variables  can  be  rewrit¬ 
ten  as  z  =  s  +  je' ,  where  s  and  s'  are  real,  in¬ 
dependent,  and  symmetrically  distributed,  and 

j2  d=  —1.  QAM  constellations,  in  digital  com¬ 
munications,  belong  to  this  class. 

•  Discrete  distributions  that  are  invariant  by  a  ro¬ 
tation  of  an  angle  of  the  form  ^ ,  (K  G  N). 
QPSK,  double  QPSK,  and  any  n-PSK  are  in¬ 
cluded  in  this  class  as  soon  that  n  >  4. 


we  get  the  matrix  formulation: 

C12Y  =  At  B  A  (6) 

where  A  is  the  (L2  +  1)  x  (Li  +  L2  +  1)  Upper  Tri¬ 
angular  Band  (UTB)  Toplitz  matrix  containing  hi  = 
[h(0)  . . .  h(L\)]  in  the  first  row  and  zeros  elsewhere: 


3.  CHANNEL  IDENTIFICATION 

First,  the  basis  of  the  identification  process  is  pre¬ 
sented,  then  the  algorithms  are  derived,  and  a  proof 
of  uniqueness  is  eventually  given. 

3.1.  Moment-matching  relations 

Consider  the  following  assumptions: 

(AS1)  The  channel  is  Linear-Quadratic  of  finite  known 
length. 

(AS2)  The  input  is  stationary  independent  identically 
distributed  (i.i.d.),  and  must  comply  with  the 
properties  (2);  <r2  =  eu  and  /i4x  =  £22  are  also 
assumed  to  be  known. 

(AS3)  The  noise  is  signal-independent  white  Gaussian. 
Let  us  now  define  the  complex  bicorrelation  as: 

Cny{lt  k)  d=  E  {y*  ( n)y(n  +  l)y(n  +  k)}  (3) 

Under  assumptions  (AS1-AS3),  the  bicorrelation 
of  the  output  (3)  and  the  channel  model  (1)  should 
match,  which  gives  the  following  relations: 

Ci2y{l,  k)  =  H  Hi  +  0*1  U  +  *)*2 (*’,  j)  (4) 

with  (/,  k)  G  [—L2.i1]  x  [— L2,Li],  and  where' 

=f  [2e?1+i(i-J0(£23-2cf1)]  h*2(i,j).  The 
Z-transform  of  C\2(l,  k)  gives  in  the  [Z\ ,  Z2)  domain: 


while  B  is  symmetric  complex  and  contains  the  values 
of  the  kernel  h \ : 


B  d= 


k  (£3,  £3) 
k  (£2,0) 


*5(0,  £2) ' 

*5(o,o)  . 


We  propose  to  identify  the  channel  coefficients  by 
using  either  relation  (5),  or  (6)  with  the  estimate 

Ui2y  (l,  k)  d=  i  J2n=i  y*  («)  y{n  +  l)y{n  +  k). 

One  can  notice  that  C12Y  is  a  (Li  +L2  +  1)  square 
matrix  of  rank  (L2  +  1).  This  observation  allows  to 
detect  the  length  of  the  channels  (hi,  h2)  from  an  es¬ 
timate  C12Y  of  C12Y. 


3.2.  Proposed  algorithms 

We  propose  several  algorithms:  (i)  a  Root-Finding 
method  (RF),  (ii)  a  Sub-Space  Intersection  method 
(SSI),  (iii)  a  method  that  forces  the  row  span  to  have 
certain  triangular  properties  (UTB),  and  (iv)  an  itera¬ 
tive  Multidimensional  Search  method  (MS). 

(i)  One  can  give  several  values  to  Z2  in  (5),  and 
get  several  functions  of  Z\\  (Z 1).  These  functions 

F»a  (Zi)  share  the  roots  of  Hi(Zi):  ri,  which  are  de¬ 
tected  by  clustering.  The  channel  h\ (n)/hi(0)  is  the  in¬ 
verse  Z-transform  of  nf=i  ~  r»).  and  one  can  build 
A.  Denoting  A-  the  Moore-Penrose  pseudo-inverse  of 
A,  h2  is  recovered  via  the  “deconvolution”: 

B  =  At“  •  C12Y  •  A- .  (7) 


Sny{ZuZi)  =  HX  (ZX)  Hx  (Z2)  H*2  (±,  (5) 

Equations  (4)  or  (5)  form  the  core  of  the  algorithms 
subsequently  proposed.  By  stacking  the  elements  of 
Ci2y{l,k)  in  a  matrix  C12Y  as: 

Cl2y  (— L2,  — L2)  Cl2y  (  — L2,  Ll) 

C12Y  d= 

Ul2y  (Ll,  —  L2)  •••  C12y  {Lx,  Lx) 


(ii)  Alternatively,  one  can  factorize  the  matrix 
C12Y  in  order  to  recover  the  vector  hi  in  a  similar 
fashion  as  in  [8].  In  the  noiseless  case,  given  that  B 
has  no  null  eigenvalue,  the  matrix  model  (6)  implies 
clearly  that: 

row(A)  =  row(C12Y) 
col(AT)  =  col(C12Y)  W 

Considering  the  singular  value  decomposition  (SVD)  of 
the  symmetric  complex  matrix  C12Y  =  VT  .  S  .  V,  we 
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define  V  as  the  L2  +  1  first  rows  of  V,  associated  with 
the  1,2  +  1  dominant  singular  values.  Let  be  the 
L2  + 1  x  Li  + 1  submatrix  extracted  from  V  that  gathers 
the  columns  i  to  L\  +  i.  Then  the  conditions  (8)  are 
restated  as:  hi  G  VW,  Vi  G  [1, .. ,  L2+ 1].  Thus  can 
be  obtained  by  computing  the  dominant  right  singular 
vector  of  the  matrix  V  containing  all  V(’)  stacked  one 
above  the  other: 

'  y(i)  ' 

V  =  : 

Then  the  matrix  B  can  be  estimated  afterwards  by  the 
“deconvolution”  procedure  (7). 

(iii)  Another  technique  consists  of  forcing  the  UTB 
structure  of  A  beforehand  by  combining  the  rows  of 
matrix  V;  this  is  possible  because  of  Lemma  1.  Then, 
one  extracts  the  L2  +  1  dimensional  row  vectors  v^i 
contained  in  the  UTB  matrix  TV,  and  stacks  them  in 
a  matrix  V.  The  rest  of  the  procedure  is  identical  to 
the  previous  approach  (ii) . 

(iv)  Lastly,  one  can  perform  an  iterative  search  in 
the  (Li  +  L2(L2  +  l)/2)  dimensional  space  of  the  matrix 
product  of  (6)  in  order  to  find  the  parameters  0  (hi ,  h2) 
that  minimize  the  error  in  the  sense  of  the  Frobenius 
norm: 

e(hi,h2)  =  argmin  ||C12Y  -  [At  ■  B  ■  A]  (0)||^ 

3.3.  Uniqueness 

Lemma  1  Let  N  and  P  be  two  positive  integers.  Un¬ 
der  certain  regularity  conditions,  any  N  x  (N  +  P) 
rectangular  matrix  M  can  be  put  in  UTB  form  by  pre¬ 
multiplication  by  a  square  invertible  matrix  T.  The 
matrix  T  is  unique  up  to  an  invertible  diagonal  multi¬ 
plicative  matrix. 

Proof:  The  constructive  algorithm  is  very  similar  to 
Gaussian  elimination.  Assume  there  are  two  matrices 
Ti  and  T2  such  that  M  =  Ti  Ui  and  M  =  T2U2, 
where  Ui  and  U2  are  UTB.  Then,  considering  the 
N  first  columns  of  both  sides  shows  that  the  matrix 
TiTj  1  relates  two  Lower  Triangular  (LT)  matrices, 
and  is  thus  LT  itself.  Similarly,  considering  the  N  last 
columns  shows  that  T1T2  1  is  Upper  Triangular  (UT). 
Thus,  it  is  diagonal,  which  eventually  shows  that  Ti 
and  T2  are  related  by  a  diagonal  multiplicative  ma¬ 
trix.  □ 

Lemma  2  Any  symmetric  complex  matrix  C  can  be 
factorized  as  C  =  LLT,  where  L  is  lower  triangular. 
Matrix  L  is  unique  up  to  the  post- multiplication  of  a 
diagonal  matrix  A  formed  of  signs  {+!}. 


Proposition  3  If  B  is  square  full  rank,  and  A  is 
UTB,  then  the  decomposition  of  a  complex  symmetric 
matrix  C  =  AT  B  A  is  unique  up  to  a  multiplicative 
diagonal  matrix. 

Proof:  The  proposition  is  a  direct  consequence  of  lem¬ 
mas  1  and  2.  It  is  easily  seen  that  if  (A,  B)  is  solution, 
then  so  is  (AA,  A-1BA-1),  where  A  is  any  diagonal 
regular  matrix.  0 

Corollary  4  Let  B  be  full  rank  symmetric  complex 
and  A  Toplitz  UTB.  When  the  decomposition  of  a  sym¬ 
metric  matrix  as  C12Y  =  AT  ,B  .  A  exists,  then  it  is 
unique  up  to  a  scalar  multiplicative  factor. 

Proof:  From  proposition  3,  if  A  is  solution,  then  so  is 
AA,  with  A  diagonal.  But  because  A  is  Toplitz,  AA 
can  be  Toplitz  only  if  A  is  proportional  to  the  Identity 
matrix.  □ 

4.  SIMULATIONS 

In  order  to  illustrate  the  Root-Finding  (RF)  method 
step  by  step,  we  first  present  a  typical  example  with 
only  RF  and  MS  methods.  Later  we  show  a  a  more 
exhaustive  study  with  all  the  methods.  Because  we  are 
mainly  interested  in  direct  methods,  the  MS  is  given 
only  as  a  reference. 

In  all  simulations,  the  input  x  was  4-PSK.  We  used 
the  real  channel  given  by  [9]  (hi  =  [1, 0.5,  -0.8, 1.6, 0.4] 
and  h2  —  [1, 0.6;  0.6,  —0.3]). 

Typical  example:  The  input  is  QPSK;  the  num¬ 
ber  of  samples  is  16284  points;  and  the  SNR  is  10  dB. 
Figure  I. a.  illustrates  the  clustering  method.  It  shows 
all  the  roots  calculated  for  different  Z2,  the  true  roots, 
and  the  ones  estimated  by  the  method,  the  estimated 
roots  (stars)  are  fairly  accurate  and  match  the  real  ones 
(square).  Figure  I.b.  shows  the  spectra  of  the  real  and 
estimated  linear  channels.  Both  estimated  spectra  are 
fairly  accurate. 

Computer  comparisons:  A  first  study  showed 
that  the  estimation  noise  of  C\2y  is  rapidly  predomi¬ 
nant  over  the  additive  noise  contribution.  As  expected, 
the  Gaussian  noise  does  not  interact  in  the  third-order 
moment  as  soon  as  the  length  of  integration  is  long 
enough.  So  we  mainly  tried  to  estimate  the  influence 
of  the  number  of  samples.  For  each  number  of  samples 
we  took  1000  independent  realizations,  and  the  SNR 
is  10  dB.  For  each  realization,  we  estimated  Ci2y,  on 
which  we  applied  all  the  algorithms.  Since  in  our  case 
C12Y  is  6  x  6,  the  most  computational  intensive  step 
is  its  estimation  for  the  direct  methods.  Due  to  its  it¬ 
erative  nature,  up  to  several  thousand  of  samples,  the 
most  intensive  step  for  the  MS  method  is  the  multi¬ 
dimensional  search. 

Figure  II  presents  the  influence  of  the  integration 
length  on  the  mean  and  variance  of  both  estimates. 
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Figures  II. a.  and  II. b.  show  that  all  methods  con¬ 
verge  to  the  true  channel,  the  bias  behaves  well  from 
4096  points.  The  RF  is  the  slowest  method  to  con¬ 
verge  to  the  expected  value,  while  the  MS  is  the  fastest 
to  converge,  The  SSI  and  the  UTB  follow  similar  pat¬ 
terns. 

Figures  II.c.  and  Il.d.  present  the  variances  of  both 
methods.  The  variances  follow  approximately  a  linear 
slope.  It  is  difficult  to  decide  which  method  behaves 
the  best.  One  can  notice  that  the  MS  has  stationary 
performance  after  64000  samples,  this  is  because  this 
method  was  implemented  in  a  too  rustic  way,  and  it 
happens  that  a  few  times  the  MS  algorithm  is  stuck  in 
local  minima,  thus  degrading  the  quality  of  the  stan¬ 
dard  deviation.  While  not  visible  on  the  figure,  the  best 
method  varies  for  each  element  of  hi,  and  generally 
around  4096  samples  the  best  method  changes.  Never¬ 
theless,  above  4096  samples  clearly  the  best  method  is 
the  SSI. 

The  variance  shows  well  the  usual  problem  with 
High  Order  Statistics:  in  order  to  have  consistent  high- 
order  moment  estimate,  the  integration  length  must  be 
long  enough:  a  minimum  of  8192  seems  to  be  required 
here. 


5.  CONCLUDING  REMARKS 

Several  methods  have  been  proposed  to  blindly  iden¬ 
tify  a  linear-quadratic  channel  for  communication  ap¬ 
plications.  The  idea  is  to  use  the  specificities  of  the 
distribution  of  the  inputs.  The  methods  have  shown 
to  converge  with  a  good  accuracy,  with  a  rather  large 
number  of  samples. 

4  i  •  •  U  I 
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Figure  I:  Example  of  Identification:  10 dB,  16284 
points. 
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variance  of  H,  mean  of  H 


Evolution  of  mean  of  H1  in  function  of  N 


Evolution  of  variance  of  H1  in  function  of  N 


Evolution  of  mean  of  H2  in  function  of  N 


Evolution  of  variance  of  H2  in  function  of  N 


Figure  II:  Means  and  standard  deviations  for  all  methods  with  1000  independent  realizations.  Simple  line:  RF, 
UTB,  d-:  SSI,  o-.MS. 
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ABSTRACT 

Interference  from  other  users  and  interference  due  to  mul¬ 
tipath  propagation  limit  the  capacity  of  wireless  communi¬ 
cation  networks.  As  the  number  of  users  and  the  demand 
for  new  services  in  the  networks  increases,  co-channel  inter¬ 
ference  will  be  a  limiting  factor. 

This  paper  proposes  an  iterative  structured  multi¬ 
channel  receiver  algorithm  that  jointly  estimates  the  com¬ 
munication  channels  and  desired  data  while  canceling  inter¬ 
ference.  A  general  way  of  adding  training  redundancy  to  a 
data  frame  is  also  introduced. 

Prom  simulations  the  proposed  method  is  shown  to 
achieve  low  bit  error  rates  also  in  the  presence  of  strong  in¬ 
terference.  These  simulations  also  show  that  by  distributing 
the  training  information  in  a  data  burst  elaborately,  further 
improvements  in  performance  are  achievable. 

1.  INTRODUCTION 

During  the  last  decades,  a  rapid  development  in  mobile 
communications  has  occurred.  The  seemingly  ever  increas¬ 
ing  number  of  users  and  services  has  caused  equally  in¬ 
creasing  demand  for  capacity  and  reliability.  Because  of 
the  physical  limitations  of  radio  communications  and  the 
limited  bandwidth  available  these  demands  are  difficult  to 
meet. 

One  of  the  factors  that  limits  capacity  is  the  interfer¬ 
ence  from  other  users,  Co-Channel  Interference  or  CCI.  The 
problem  is  further  complicated  by  the  fact  that  in  realistic 
wireless  communication  systems  there  will  always  be  some 
amount  of  multi-path  propagation  causing  Inter-Symbol  In¬ 
terference  or  ISI.  Thus,  by  developing  receivers  that  can 
handle  these  kinds  of  interference,  the  capacity  and  relia¬ 
bility  in  the  wireless  network  can  be  increased.  One  way  of 
combating  interference  is  through  the  use  of  antenna  arrays, 
thus  creating  a  multi-channel  system.  The  receiver  systems 
considered  in  this  paper  are  all  multi-channel. 

This  paper  considers  an  iterative  algorithm  that  at  the 
same  time  it  is  rejecting  interference  also  estimates  trans¬ 
mitted  data  and  baseband  transmission  channels.  The  pro¬ 
posed  receiver  is  semi-blind,  i.e.,  it  uses  training  information 
available  for  the  desired  user. 

Several  other  approaches  have  been  taken  to  reject  in¬ 
terference.  Iterative  Least  Squares  with  Projection  (ILSP) 
is  introduced  in  [1,  2].  ILSP  is  a  blind  method  to  separate 
several  co-channel  signals  using  the  Finite  Alphabet  (FA) 


property  of  digital  communication  signals.  However  it  does 
not  handle  ISI  nor  does  it  handle  training  information  in 
a  natural  fashion.  The  method  presented  herein  is  similar 
to  ILSP  but  taking  ISI  and  training  information  into  ac¬ 
count  as  well.  In  [3]  an  interference  rejection  algorithm  is 
presented  that  by  using  ILSP,  oversampling  and  an  extra 
processing  step  is  able  to  also  handle  ISI.  Another  method 
similar  to  ILSP  is  proposed  in  [4],  this  method  also  handles 
training  sequences  and  ISI.  However  it  does  not  handle  the 
structure  imposed  by  the  ISI.  Another  class  of  interference 
rejection  algorithms  are  subspace  methods.  These  use  al¬ 
gebraic  subspace  properties  to  reject  interference  based  on 
second  order  statistics.  An  example  of  such  a  method  used 
for  comparison  in  this  paper  can  be  found  in  [5]. 

2.  DATA  MODEL 

An  L  element  antenna,  with  symbol  spaced  base  band  sam¬ 
pling  is  considered.  For  simplicity  only  one  desired  user  and 
one  interfering  user  is  considered  (even  though  the  data 
model  and  proposed  receiver  algorithm  easily  can  be  ex¬ 
tended  to  multiple  users  and  interferers).  The  interferer  is 
assumed  to  be  using  the  same  modulation  scheme  as,  and  be 
burst  synchronized  with,  the  desired  user.  Within  a  burst 
the  user  and  the  interferers  send  one  data  frame  consisting 
of  N  symbols  of  which  Nd  symbols  are  unknown  data  and 
the  rest  are  used  for  training  purposes.  The  radio  chan¬ 
nels  between  the  transmitters  and  the  receiving  antennas 
are  assumed  to  be  time  invariant  within  one  data  frame.  It 
is  also  assumed  that  the  transmission  process  between  the 
transmitter  and  the  receiver,  including  the  effects  of  the 
transmitter  and  receiver  filters  can  be  modeled  as  a  FIR 
filter  of  length  M.  It  is  then  possible  to  model  the  received 
data  as 

X  =  HS  +  GD  +  V.  (1) 

Where  X  (which  is  L  x  (N  +  M  -  1))  contains  the  data  re¬ 
ceived  by  the  antenna  array.  The  channel  matrices,  H  and 
G  (both  L  x  M),  describe  the  transmission  process  between 
the  desired  user  and  the  interferer  respectively.  The  trans¬ 
mitted  data  is  contained  in  S  and  D  ( M  x  (N  +  M  —  1)) 
while  V  models  additive  noise.  The  received  data  matrix 
X  is  organized  as  X  =  [*(1)  *(2)  ...  x(N  +  M  —  1)] 

where  x(n)  is  a  column  vector  containing  the  the  data  out¬ 
put  from  the  array  at  the  nth  sampling  instant.  To  exem¬ 
plify  the  organization  of  the  data  matrices,  the  data  matrix 


0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


186 


of  the  desired  user  is 


4.  PROPOSED  ALGORITHM  -  OUTLINE 


- F - 

1  0 

0  j 

S  = 


(2) 


Where  s  is  a  vector  containing  the  data  symbols  transmit¬ 
ted  in  one  frame.  Prom  (2)  the  structure  of  the  data  matri¬ 
ces  becomes  obvious.  In  order  to  achieve  good  performance 
a  receiver  algorithm  must  preserve  this  structure. 


3.  PROBLEM  FORMULATION 

The  problem  of  estimating  the  unknown  data  vectors  and 
channel  matrices  is  considered.  It  is  assumed  that  training 
information  is  available  for  the  desired  user  while  it  is  un¬ 
known  for  the  interferer.  The  transmission  of  the  data  is 
disturbed  by  spatially  and  temporally  additive  white  com¬ 
plex  Gaussian  noise. 

The  goal  is  to  find  the  maximum  likelihood  estimates 
of  H,  S,  G  and  D.  That  is,  the  H,  S,  G  and  D  that 
minimizes 


\\X  -HS-GD\\l  (3) 


taking  the  finite  alphabet  property  of  the  signals  into  ac¬ 
count.  Note  that  given  the  data  symbols,  the  criterion 
is  quadratic  in  the  channel  matrices.  After  rewriting  this 
norm  as 


\\X-HS-GD\\l  = 


X  —  [H  G] 


(4) 


it  can  be  minimized  with  respect  to  [H  G] , 
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The  algorithm  proposed  in  this  paper  takes  an  iterative  ap¬ 
proach  to  minimize  (3)  while  maintaining  the  structure  of 
the  data  matrices  (see  (2)).  Known  training  information  is 
also  taken  into  account.  The  iterative  procedure  of  the  pro¬ 
posed  algorithm  is  similar  to  the  ILSP  algorithm  proposed 
in  [1,  2], 

Assuming  that  initial  estimates  of  the  data  sequences 
are  available  the  method  can  be  outlined  as 

1.  Assume  that  the  estimated  data  sequences  are  cor¬ 
rect.  The  norm  (3)  is  now  quadratic  in  H  and  G 
and  it  is  easy  to  estimate  the  channel  matrices. 

2.  Rewrite  the  norm  (3)  so  that  it  can  be  minimized  in 
a  way  that  maintains  the  structure  of  S  and  D  and 
takes  available  training  information  into  account. 

3.  Now,  assume  that  the  estimated  channel  matrices  are 
correct.  The  norm  (3)  becomes  quadratic  in  S  and 
D  if  we  relax  the  FA-property.  Thus,  it  is  possible 
to  estimate  the  unknown  data  symbols  by  solving  a 
linear  set  of  equations. 

4.  Project  the  data  on  its  finite  alphabet. 

5.  Repeat  the  steps  above  until  convergence. 

If  the  initial  data  estimates  are  good  enough  the  method 
will  in  general  converge  to  the  desired  global  minimum  of  (3) 
and  the  initial  data  estimate  are  improved. 

The  proposed  method  also  makes  it  possible  to  gen¬ 
eralize  how  the  training  information  is  added  to  the  data 
sequence.  This  is  considered  in  the  following  section.  A 
more  detailed  description  of  the  algorithm  can  be  found  in 
section  6. 

5.  GENERALIZED  TRAINING  USING  CODE 
MATRICES 

When  a  training  sequence  is  added  to  a  data  frame  it  is  usu¬ 
ally  either  simply  inserted  in  the  beginning  or  at  the  middle 
of  the  data  frame.  Here  a  more  general  way  of  adding  the 
training  data  is  introduced  by  the  affine  mapping 

s  =  C\Sd  +  Co-  (7) 


Where  A *  denotes  the  pseudo  inverse  of  A.  After  having 
resubstituted  H  and  G  into  (4)  a  minimization  criterion 
only  depending  on  S  and  D  is  achieved, 


min 

S,D 


2 

F 


(6) 


Where  P Jj  =  I  —  A *  (A A*)-1  A  and  I  is  the  identity  ma¬ 
trix.  It  is  now  possible  to  find  the  global  minimum  by 
enumerating  over  all  possible  S  and  D  using  their  FA- 
property  and  known  training  information  while  maintaining 
the  structure  of  the  matrices.  The  enumerating  however  is 
of  exponential  complexity  which  makes  this  enumerating 
impossible  also  for  modest  data  frame  sizes.  The  follow¬ 
ing  sections  consider  a  suboptimal  method  that  attempts 
to  minimize  (3)  with  less  computational  complexity. 


Where  s  (N  x  1)  contains  the  data  to  be  transmitted  (data 
and  training  information),  Sd  ( Nd  x  1)  contains  the  data 
without  training  information.  C i  ( N  x  No)  and  Co  ( N  x 
1)  are  Code  Matrices  that  add  training  information  (and 
possibly  error  correcting  redundancy)  to  the  data. 

It  is  obvious  that  the  code  matrices  can  be  chosen  so 
that  training  information  is  added  to  the  data  sequence  in 
the  conventional  way  described  above.  However  this  also 
provides  the  opportunity  of  adding  training  information 
more  elaborately.  For  example  the  training  information  can 
be  distributed  over  the  entire  data  sequence. 

6.  PROPOSED  ALGORITHM  -  DETAILS 

The  steps  of  the  proposed  algorithm  outlined  in  section  4 
are  presented  in  more  detail  in  this  section.  It  is  assumed 
that  an  initial  estimate  of  the  unknown  user  data  and  the 
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interferer  data  is  present.  Further  it  is  assumed  that  the 
code  matrices  Co  and  C i  are  known  for  the  desired  while 
they  are  not  available  for  the  interferer. 


6.1.  Estimating  the  Channel  Matrices 

If  we  assume  the  estimated  data  sequences  to  be  correct  a 
least  squares  estimate  of  the  channel  matrices  can  be  found 
as  (see  (5)) 


[H  G]  =  X 


6.2.  Maintaining  the  Structure  of  the  Data  Matri¬ 
ces 

To  maintain  the  structure  of  the  data  matrices  while  es¬ 
timating  them  the  norm  (3)  must  be  rewritten.  This  can 
be  achieved  using  properties  of  the  vec  operator  and  the 
Kronecker  product.  Letting  vec  denote  the  vec  operator, 
8  denote  the  Kronecker  product  and  I  denote  the  identity 
matrix,  this  rewriting  can  be  done  in  a  few  steps  as  follows, 

vec  {X  -  HS  -  GD}  =  vec  X  -  (I  ®  H )  vec  S 

-(/®G)vecD.  '8' 


To  simplify  notation,  let  4>h  =  J  ®  H ,  =  I  ®  G,  and 

x  =  vecX.  Also,  the  ( NM  x  N)  selection  matrix  \P  is 
defined.  The  matrix  tP  consists  of  zeros  and  ones  and  takes 
a  data  vector  to  a  vectorized  data  matrix,  i.e.  vec  S  =  fPs 
and  vec  D  =  <Pd.  Now,  (8)  can  be  rewritten  as 


vec  {X  -  HS  -  GD}  =x-$H*s- 

=  X  -  ^H^CiSd 
-  $H^Co  - 
=  x  —  tFCo 


(9) 


where  the  middle  step  follows  from  (7).  By  using  (9)  the 
norm  (3)  can  now  be  minimized  with  respect  to  the  data 
while  maintaining  the  structure  of  the  data  matrices  S,  D. 


6.3.  Estimating  the  Received  Data 

By  using  (9)  and  assuming  the  estimated  channels  to  be 
correct  we  now  obtain  continuous  estimates  of  the  unknown 
data  vectors  Sd  and  d.  This  can  be  done  much  in  the  same 
way  as  the  estimation  of  the  channel  matrices  which  results 
in 


s_d 

d 


(x  -  ^H^Co)  • 


(10) 


The  unknown  data  can  be  estimated  by  projecting  the  con¬ 
tinuous  data  estimates  to  the  finite  alphabet  in  use. 

Finally  the  three  steps  above  are  iterated  until  conver¬ 
gence  is  reached.  If  the  initial  estimates  are  good  enough 
they  are  in  general  improved. 


7.  PRELIMINARY  RESULTS 

To  give  some  insight  to  the  kind  of  performance  that  the 
proposed  algorithm  might  offer,  simulations  have  been  con¬ 
ducted  and  the  results  from  these  are  presented  in  this  sec¬ 
tion.  In  order  to  offer  some  comparison  with  previous  work, 
the  structured  subspace  receiver  described  in  [5]  was  sim¬ 
ulated  under  the  same  conditions  and  results  from  these 
simulations  are  provided. 

Two  different  sets  of  code  matrices  were  used  (see  sec¬ 
tion  5).  One  conventional  with  all  the  training  symbols  in 
the  beginning  of  the  sequence  and  one  with  the  training 
symbols  spread  over  the  entire  sequence.  In  the  simula¬ 
tions  of  the  structured  subspace  receiver  the  entire  training 
sequence  was  located  in  the  beginning  of  the  data  frame. 

In  all  cases  an  L  =  4  antenna  system  was  considered. 
An  antipodal  binary  modulation  scheme  was  employed  (this 
would  for  example  correspond  to  BPSK). 

To  model  the  transmission  process  (the  transmit¬ 
ter/receiver  filters  and  the  radio  channel)  a  two  tap  FIR 
channel  model  was  used.  The  channels  were  assumed  inde¬ 
pendent  from  antenna  to  antenna  and  to  simulate  Rayleigh 
fading  the  channel  taps  were  independently  drawn  from  a 
complex  Gaussian  distribution. 

In  the  simulations  it  was  assumed  that  the  length  of  the 
channel  impulse  responses,  M,  and  the  number  of  transmit¬ 
ters,  U,  are  known  or  have  been  correctly  estimated. 

To  offer  some  idea  about  what  the  achievable  perfor¬ 
mance  would  be,  a  simple  initialization  scheme  was  em¬ 
ployed.  Interferer  data  was  initialized  with  its  continuous 
solution  (of  the  minimization  of  the  norm  (4),  ignoring  the 
structure  of  the  data  matrices,  see  e.g  [2])  projected  to  the 
finite  alphabet  in  use.  The  desired  user  data  was  initial¬ 
ized  with  random  data  symbols.  Received  sequences  where 
the  resulting  norm  (3)  was  smaller  than  the  true  norm  (the 
norm  (3)  achieved  using  the  true  data  and  channel  matrices) 
plus  one  standard  deviation  of  the  norm  were  kept  while  re¬ 
ceived  sequences  not  fulfilling  this  criteria  were  identified  as 
outliers. 

In  figure  1  the  bit  error  rate  performance  of  the  proposed 
method  as  a  function  of  the  Signal  to  Noise  Ratio  (SNR) 
is  shown.  The  desired  user  is  disturbed  by  a  single  inter¬ 
ferer.  The  Signal  to  Interference  Ratio  (SIR)  in  these  sim¬ 
ulations  was  —10  dB.  The  results  from  the  simulated  pro¬ 
posed  method  are  compared  with  the  structured  subspace 
method  with  estimated  channels  and  with  known  channels. 
Also,  the  two  different  sets  of  training  matrices  (described 
above)  are  compared.  The  data  frames  consist  of  57  sym¬ 
bols  of  which  42  are  data  symbols  and  the  rest  are  used 
for  training  purposes.  At  these  conditions  the  proposed 
method  performs  well  on  par  with  the  structured  subspace 
method  using  perfect  channel  estimates.  The  structured 
subspace  method  by  itself  needs  longer  training  sequences 
in  order  to  perform  well  (see  figure  4).  The  distributed 
training  information  offers  slightly  better  performance  than 
the  conventional  training  sequence.  Even  though  the  differ¬ 
ence  in  performance  is  small  this  is  interesting  as  both  these 
data  distributions  use  the  same  number  of  training  and  data 
bits.  Only  how  they  are  distributed  differ. 

To  explore  the  loss  in  performance  due  to  the  interfer¬ 
ence,  the  proposed  method  was  simulated  with  and  with- 
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Signal  to  Noise  Ratio  (in  dB) 

Figure  1:  Performance  with  a  single  -10  dB  interferer 
present. 


out  an  interferer.  Other  than  that  the  simulated  condi¬ 
tions  were  identical  to  the  previous  simulation.  The  re¬ 
sults  from  these  simulations  are  shown  in  figure  2.  As  can 
be  seen  from  the  graph,  at  an  SNR  of  4  dB  the  loss  is 
approximately  1.5  dB,  both  with  the  conventional  train¬ 
ing  sequence  and  with  the  distributed  training  information. 
Again  slightly  lower  bit  error  rates  were  achieved  when  the 
distributed  training  information  was  used  compared  to  the 
more  conventional  training  data  distribution. 

The  number  of  data  frames  not  converging  to  a  norm 
small  enough,  the  rejection  rate,  was  also  measured  under 
the  same  conditions  as  in  the  previous  simulations.  Figure  3 
shows  the  results  from  these  measurements.  As  can  be  seen 
from  the  graph,  when  there  is  CCI  present  the  rejection 
rate  becomes  quite  high  and  it  would  be  desirable  to  use  a 
better  initialization  method. 

The  effects  of  the  length  of  the  training  sequence  was 
also  given  some  attention.  Once  again  the  proposed  al¬ 
gorithm  with  the  two  different  training  distributions  and 
the  structured  subspace  method  (found  in  [5])  were  com¬ 
pared.  Figure  4  shows  the  bit  error  rate  of  the  desired  data 
sequence  as  a  function  of  the  number  of  training  symbols 
and  figure  5  shows  the  rejection  rate  as  a  function  of  the 
number  of  training  symbols.  These  simulations  were  per¬ 
formed  at  an  SNR  of  4  dB,  with  and  without  a  single  -10 
dB  co-channel  interferer.  The  number  of  data  symbols  in 
each  frame  remained  42.  From  figure  4  it  can  also  be  seen 
that  the  proposed  method  is  less  sensitive  to  short  training 
sequences  than  the  method  used  for  comparison.  Figure  5 
shows  that  the  number  of  rejected  sequences  increases  fast 
when  the  number  of  training  symbols  drops  below  15.  It 
seems  likely  that  the  convergence  criteria  might  affect  sim¬ 
ulated  bit  error  rates  when  the  number  of  training  symbols 
becomes  smaller  than  that. 

As  can  be  seen  from  the  results  above  the  proposed 
method  is  showing  promising  performance.  However  there 
are  still  several  issues  that  require  further  investigation.  For 
example,  in  its  current  implementation  the  proposed  re¬ 
ceiver  algorithm  is  computationally  expensive.  Also  the  ro- 


Signal  to  Noise  Ratio  (in  dB) 

Figure  2:  Performance  lost  due  to  interference. 


Signal  to  Noise  Ratio  (in  dB) 


Figure  3:  Rejection  rates  as  functions  of  the  SNR. 


Number  of  training  symbols 


Figure  4:  Error  rates  at  an  SNR  of  4  dB. 
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Figure  5:  Rejection  rates  at  an  SNR  of  4  dB. 


[5]  G.  Klang  and  B.  Ottersten,  “Channel  estimation  and  in¬ 
terference  rejection  for  multichannel  systems,”  in  Pro¬ 
ceedings  of  the  32th  Asilomar  Conference  on  Signals, 
Systems  and  Computers,  (Pacific  Grove,  CA,  USA),  nov 
1998. 


bustness  to  model  errors  and  initialization  are  other  issues 
that  deserve  more  attention.  More  general  forms  of  train¬ 
ing  information  where  the  data  is  confined  to  more  general 
affine  mappings  can  easily  be  considered  with  the  proposed 
method. 


8.  CONCLUSIONS 

Herein,  we  have  presented  a  interference  cancellation 
method  that  can  be  applied  to  multi-channel  data.  Train¬ 
ing  information  from  the  desired  user  is  exploited  and  the 
communication  channels  are  jointly  estimated  together  with 
the  unknown  data  symbols  of  both  the  desired  user  and  the 
interference.  This  method  can  easily  treat  general  forms  of 
training  information  and  a  simple  example  with  distributed 
training  information  was  shown  to  give  improved  perfor¬ 
mance  compared  to  a  block  of  training  data. 
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ABSTRACT 

In  this  paper  we  proposed  a  new  approach  that  clusters 
mobile  users  before  downlink  beamforming  and  broad¬ 
ens  beams  and  nulls  within  the  beamforming  calcula¬ 
tion.  We  first  investigate  the  broadening  beamforming 
scheme  to  alleviate  inaccuracies  in  DOA  estimation. 
Next  we  exam  how  to  group  the  mobile  users,  with 
the  constraint  of  separation  angle,  to  enhance  down¬ 
link  beamforming.  Simulations  show  that  the  down¬ 
link  beamforming  complexity  is  decreased  dramatically 
with  limited  performance  loss. 

1.  INTRODUCTION 

Owing  to  the  rapid  growth  demand  in  the  mobile  com¬ 
munication,  the  current  capacity  of  mobile  commu¬ 
nication  faces  a  severe  challenge  during  peak  usage. 
To  remedy  the  capacity  limitation,  research  on  space- 
division  multiple-access  (SDMA),  which  increases  sys¬ 
tem  capacity  and  decreases  co-channel  interference,  has 
been  investigated. 

A  basic  idea  of  SDMA  is  to  spatially  separate  the 
mobile  users,  which  allows  reuse  of  limited  radio  re¬ 
sources,  such  as  frequency,  time,  or  code  slot  within 
a  cell.  SDMA  relies  on  the  application  of  an  adap¬ 
tive  array  antenna  at  the  base  station  to  form  mul¬ 
tiple  beam  patterns,  which  serve  multiple  user  traffic 
channels.  Therefore  the  capacity  of  the  system  can  be 
increased. 

Prior  research  shows  that  implementing  SDMA  on 
the  downlink  increases  the  channel  capacity  [1],  [2], 
[3].  One  simple  SDMA  approach  uses  the  DOA  esti¬ 
mated  from  uplink  data  and  forms  the  spatial  signature 
for  downlink  transmission.  However,  in  urban  environ¬ 
ments,  angular  spreads  (AS)  could  be  up  to  15°  [4], 
which  means  the  estimated  downlink  beamforming  pat¬ 
tern  may  degrade  system  performance  due  to  narrow, 
misaligned  nulls.  In  addition,  if  the  user  DOAs  are 


not  well  separated,  SDMA  cannot  provide  much  system 
performance  improvement.  Furthermore,  the  downlink 
beamforming  algorithm  needs  extensive  computation 
power  to  solve  a  nonlinear  optimization  problem  in¬ 
volving  a  nonlinear  constraint  weight  vector  for  every 
user  [5].  This  limits  the  applicability  of  this  approach 
for  low  complexity,  real-time  operation. 

This  paper  proposes  a  new  approach  that  clusters 
(groups)  mobile  users  before  the  downlink  beamform¬ 
ing  calculation.  This  approach  alleviates  the  computa¬ 
tional  complexity  problem  and  the  spatial  separability 
problem.  The  algorithmic  block  diagram  is  shown  in 
Figure  1.  By  carefully  choosing  AS  and  forming  the 
same  beamforming  weight  vector  wgroup  to  the  same 
group,  the  simulation  results  show  that  the  clustering 
scheme  is  within  3  dB  of  the  conventional  method,  with 
a  dramatic  decrease  in  computational  complexity. 


Cluster 

Scheme 


DOA 

Estimation 


Weight 
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and 

select 
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Figure  1:  New  Cluster  Algorithm  for  Downlink  Beam¬ 
forming 


2.  DATA  MODEL 

We  assume  that  K  users  are  served  within  the  same  cell 
by  the  base  station  with  a  uniform  linear  array  Antenna 
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(ULA)  consisting  of  M  identical,  omnidirectional  sen¬ 
sors,  equally  spaced  at  distance  d.  A  narrowband  signal 
model  is  assumed  and  the  baseband  signal  received  at 
time  t  with  Lk  paths  for  the  fcth  user  is: 

K  Lk 

x(t)  Akt  a(°kiJu)  sk(t  -  Tki)  +  n(t)  (1) 

k= 1 (=1 

where  n(t)  is  spatially  and  temporally  white  Gaus¬ 
sian  noise  and  the  array  steering  vector  a(8,  fu)  is  given 
by 

a(6,fu)  =  [l,e~j2wd^  sin9, ...,  e-j2jrd^-(M-l)  sin  0]T 

(2) 

where  A*;  is  the  amplitude  of  the  Ith  path  of  the 
kth  user,  Sk(t)  is  the  baseband  signal  transmitted  at 
the  kth  mobile  and  Tki  is  its  corresponding  delay. 

Prom  the  received  uplink  signal,  it  is  possible  to 
estimate  the  spatial  covariance  matrix,  which  contains 
the  directional  information  of  the  mobile  radio  channel 
(dominant  DO  As  Om )  and  corresponding  power  for  each 
user.  It  can  be  written  as  following: 


Lk 

Rk  =  5^  Ah  a{8ki  ,fd)  aH  (8ki ,  fd)  (3) 

(=i 

Similarly  we  define  the  interference  covariance  ma¬ 
trix  Qk  as 


Qk  =  ^2Ri  +  <T%  I  (4) 

i^k 

where  a2N  and  I  denote  the  white  noise  variance  and 
M  x  M  identity  matrix,  respectively. 

The  goal  of  downlink  beamforming  is  to  design  a 
weight  vectors  Wkd{fd ;  t)  to  transmit  the  constraint  power 
to  the  desired  user  and  to  minimize  the  transmitted  en¬ 
ergy  to  the  undesired  user.  In  another  word  we  want  to 
maximize  the  SINR  (Signal  to  Noise  plus  Interference 
Ratio)  for  the  fcth  user  [6]. 


w 


Wkd  =  arg  max  — ^ 


kdRkWkd 


Wkd  U>kdQkWkd 


(5) 


The  solution  of  (5)  is  proportional  to  the  generalized 
eigenvector  of  matrix  pair  [Rk,Qk]  [3] 


Wkd 


—  Jma*l 


Xk 


- dk 


JmaxJtf 


Rk  e 


[max] 

dx 


;  */  WkdRkWkd  =  xk 

(6) 


3.  TARGET  AND  NULL  BROADENING 

The  existence  of  angular  spreads  (AS)  causes  DOA  es¬ 
timation  error,  which  adversely  affects  the  downlink 
beamforming  process.  The  SINR  degrades  because  the 
maximum  transmitted  power  is  not  directed  at  the  de¬ 
sired  user,  or  because  the  nulls  pointed  towards  to  the 
cochannel  users  are  too  narrow.  One  method  presented 
in  this  section  will  make  the  SINR  more  robust  to  DOA 
estimation  error.  The  angular  spread  based  approach 
[7],  [8]  can  steer  a  broad  range  of  beam  patterns  to¬ 
wards  users  of  interest,  or  nulls  toward  the  cochannel 
users.  A  modified  version  of  interference  covariance 
matrix  can  be  written  as: 

Rk  =  Rk  ©  >5max  (7) 

Qk  =Qk®  Sm ax  (8) 

with  [5max]pg  =  e-2[^(p-?)]V- 

where  ©  and  [.]P7  denote  the  Schur  Hadamard  element- 
by  element  matrix  product  and  the  pq  th  element  of  a 
matrix,  respectively.  The  variable  o-^ax  quantifies  the 
angular  spreads  (AS)  of  the  corresponding  DOAs. 

By  using  target  and  null  broadening  technique  in 
the  downlink  beamforming, the  design  of  beamformers 
are  more  robust  in  the  mobile  communication  environ¬ 
ment.  In  addition,  the  beamforming  weights  are  valid 
for  a  longer  time  with  less  calculations  required  [8]. 
Figure  2  shows  the  beam  pattern  with  and  without 
the  broadening  technique.  It  is  clear  that  by  applying 
the  broadening  technique,  the  narrow  nulling  interfer¬ 
ence  problem  is  solved.  Although  it  introduces  some 
increase  of  the  SINR  perturbation,  the  worse  case  ef¬ 
fect  of  DOA  estimation  error  is  still  negligible  [6]. 

4.  GROUPING  AND  DOWNLINK 
BEAMFORMING  ALGORITHM 

Two  conditions  limit  the  performance  and  capacity  of 
SDMA  systems: 

1.  Users  that  share  same  channel  allocation  are  co¬ 
located,  within  the  resolution  of  the  beam  pat¬ 
tern; 

2.  Co-channel,  co-located  users  have  disparate  pow¬ 
ers,  causing  the  so-called  “near-far  problem.” 

A  proposed  solution  to  the  near  far  problem  is  grouping 
the  mobile  uses  within  power  classes  before  downlink 
beamforming  [9]. 

Utilizing  the  advantage  of  the  target  and  null  broad¬ 
ening  method,  and  the  existence  of  angular  spreading 
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Figure  2:  Conventional  Beamforming  vs.  Beamform¬ 
ing  with  Broadening  Target  and  Null  Technique  with 
Target  at  90°  and  Null  at  40° 

(AS),  we  propose  a  grouping  algorithm  that  is  con¬ 
strained  to  angle  separation  with  location  in  a  cell.  By 
grouping  all  the  users  in  a  cell  before  downlink  beam¬ 
forming  and  selective  calculation  for  downlink  beam¬ 
forming  weight  in  a  group,  the  computational  complex¬ 
ity  for  the  base  station  is  decreased  dramatically  with 
a  tolerable  performance  loss. 

The  basic  approach  of  grouping  and  downlink  beam¬ 
forming  calculation  algorithm  within  a  cell  is  the  fol¬ 
lowing: 

1.  Determine  the  angle  separation  A 0  for  each  group, 
typically  use  the  angular  spreading  (AS)  as  a  pa¬ 
rameter; 

2.  Assign  users  to  same  group  if  (A 9  <  AS); 

3.  Determine  the  representative  angular  for  each  group, 
typically  choose  the  highest  energy  interference 
source  within  a  group  as  a  representative; 

4.  Calculate  the  downlink  beamforming  weight  wnew 
for  each  group; 

5.  Apply  the  weight  wnew  for  each  user  in  the  same 
group. 

We  use  a  simulation  with  M=8  uniform  linear  an¬ 
tenna  with  half  wavelength  inter-element  spacing  to 
verify  that  the  performance  loss  is  acceptable  for  the 
above  algorithm.  Consider  N=4  sources,  one  signal-of- 
interest  (SOI)  and  three  signal-of-non-interest  (SONI), 
with  initial  SOI  DOA  of  90°  and  DOA’s  of  SONI  at 
40°,  120°  and  140°.  Figure  3  compares  SINR  error 


for  conventional  beamforming  and  the  target  and  null 
broadening  technique. 


Figure  3:  Downlink  SNIR  comparison  for  conven¬ 
tional  beamforming  method  and  beamforming  using 
the  broadening  technique. 

From  Figure  3,  it  is  clear  that  if  users  are  geomet¬ 
rically  close  enough,  in  this  case  AS  <  8°,  we  can 
reuse  the  same  downlink  weight  wnew  to  save  calcula¬ 
tions  in  base  station  with  an  acceptable  trade-off  3dB 
SINR  loss,  in  this  case.  However,  if  we  account  for 
interference  source  spreading  angles,  which  are  due  to 
the  narrow  nulls  of  traditional  beamforming,  the  per¬ 
formance  loss  due  to  angle  spreading  towards  the  co¬ 
channel  users  is  large.  Figure  4  shows  the  performance 
loss  due  to  offset  targeting  the  co-channel  users  for  the 
previous  simulation  scenario.  It  is  obvious  that  the 
broadening  technique  reduces  performance  loss  due  to 
co-channel  angle  spreading. 

We  use  a  simulation  to  demonstrate  the  complex¬ 
ity  savings  of  the  grouping  method.  Figure  5  shows 
the  performance  under  different  angle  spreading,  where 
users  are  uniformly  distributed  by  angle  in  a  cell. 

The  results  shown  in  Figure  3  and  Figure  5  indi¬ 
cate  that,  with  proper  grouping  user  within  a  cell,  it 
is  possible  to  save  more  than  50%  of  downlink  beam¬ 
forming  computational  complexity  with  limited  SINR 
performance  loss. 

5.  SIMULATIONS 

The  simulations  model  a  system  that  uses  a  linear  ar¬ 
ray  antenna  with  M  —  8  antennae  and  half  wavelength 
inter-element  spacing  and  N  =  25  mobile  users  uni¬ 
formly  distributed  from  [0  ir)  within  a  cell.  Figure 
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Figure  4:  Performance  loss  due  to  co-channel  users  an¬ 
gle  offset. 


Figure  5:  Group  number  vs.  user  number  under  various 
angle  spreading  conditions. 

6  shows  the  block  diagram  for  conventional  downlink 
beamforming  and  the  flow  chart  for  the  grouping  algo¬ 
rithm. 

Based  on  Figure  6,  Table  1  addresses,  under  the 
simulation  environment  model,  the  computational  load 
for  each  block. 

It  is  obvious  that  the  proposed  method  needs  only 
one-third  of  typical  base  station  complexity  for  calcula¬ 
tion  R  Q  and  Wdown-  From  the  entire  system  viewpoint, 
the  new  method  reduces  the  computational  complexity 
needed  in  the  base  station  for  SDMA  applications  by 


Conventional  Downlink  BF 


Downlink  BF  With  Broaden  and  Group  Technique 


Figure  6:  Block  Diagram  for  Conventional  Dowlink  BF 
Algorithm  and  Algorithm  with  Broadening  Technique 

approximately  50%. 

Figure  7  shows  the  performance  of  grouping  plus 
broadening  target  and  nulls  scheme,  assuming  that  an¬ 
gle  spreading  exists  on  all  sources  (desired  user  and 
cochannel  interference).  The  worse  scenario  is  target 
and  nulls  not  coincident  with  the  estimated  DOAs  are 
at  maximum  offset,  AS  =  8.  Figure  7  shows  that  worse 
case  SINR  loss  decreases  substantially  by  using  group 
and  broadening  scheme. 

Combining  the  results  of  Figure  6  and  Figure  7  in¬ 
dicates  the  efficacy  of  the  new  approach.  By  grouping 
mobile  user  in  a  cell,  and  using  the  broadening  target 
and  nulls  technique,  the  downlink  beamforming  calcu¬ 
lation  is  reduced  by  approximately  50%,  with  accept¬ 
able  performance  loss. 

6.  CONCLUSION 

In  this  paper,  we  have  studied  the  grouping  and  broad¬ 
ening  target  and  nulls  technique  for  downlink  beam¬ 
forming  in  mobile  communication  systems.  Computer 
simulations  show  that  the  benefit  of  grouping  users  not 
only  can  alleviate  the  DO  A  estimation  error  problem, 
but  also  can  offer  robust  beamforming  performance 
in  the  present  of  source  movement  [8].  Moreover, 
the  computation  complexity  in  the  base  station  is  de- 
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BF  with 

Conventional 

Calculation 

broadening 

BF 

Effort 

R 

8 

25 

(3) 

Q 

8 

25 

(4) 

a{9) 

8 

25 

(2) 

w 

8 

25 

(6) 

X 

25 

25 

X  =  s*w 

Schur 

Product 

8*2 

0 

© 

Decision 

25 

weight  Select 

0 

Table  1:  Computational  Effort  Comparison  for  Con- 
ventioanl  BF  and  BF  with  Group  and  Broadening 
Technique 
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Figure  7:  Simulation  Result  Under  N=25;  Group  with 
8°;  Target  and  Interference  Both  Offset  Criteria 


creased  dramatically,  without  significant  performance 
loss  for  SDMA  systems. 
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ABSTRACT 

Linear  periodic  time-varying  filters  are  often  introduced  to¬ 
day  in  telecommunication.  They  spread  the  spectrum  and 
can  be  used  for  scrambling,  multi-user  access  or  channel 
modeling.  Recently,  the  authors  have  defined  linear  cy¬ 
clostationary  filters.  In  particular,  this  generalization  has 
permitted  to  take  into  account  the  random  parameters  of  a 
transmission  channel.  This  paper  defines  a  new  case  of  lin¬ 
ear  cyclostationary  filter  where  information  is  included  into 
the  filter. 

We  first  recall  the  definition  of  linear  periodic  and  linear 
cyclostationary  filters.  The  paper  presents  then  particular 
cases  of  these  filters  based  on  clock  change.  Thus,  we  in¬ 
troduce  modulated  periodic  clock  change.  This  filter  can 
be  used  to  transmit  simultaneously  an  analog  and  a  digital 
signal.  We  present  the  reconstruction  method  of  the  initial 
signals.  We  obtain  reconstruction  results  in  the  case  of  the 
simultaneous  transmission  of  an  analog  and  a  binary  infor¬ 
mation. 


1.  INTRODUCTION 

In  telecommunications,  signals  subjected  to  a  linear  period¬ 
ic  filter  [  1]  [2]  are  often  encountered.  Thus,  this  filter  spread 
the  spectrum  and  can  correspond  to  a  scrambling  system 
[3],  a  multi-user  access  method  [4]  or  a  transmission  chan¬ 
nel  modeling  [5].  Recently,  it  was  shown  that  they  can  be 
generalized  in  linear  cyclostationary  filters  [6]. 

In  the  first  section,  we  recall  some  definitions.  In  par¬ 
ticular,  we  present  the  definition  of  linear  cyclostationary 
filter.  We  introduce  then  a  new  filter  called  modulated  peri¬ 
odic  clock  change.  It  permits  to  transmit  simultaneously  an 
analog  and  a  digital  signal.  We  present  the  reconstruction  of 
the  input  signals.  Finally,  we  apply  the  obtained  reconstruc¬ 
tion  results  to  the  transmission  of  an  analog  and  a  binary 
information. 


2.  DEFINITIONS 

2.1.  Stationary  and  cyclostationary  processes 

Let  A  —  {4(f),  t  G  R)  be  an  harmonisable  zero  mean  and 
mean  square  continuous  process.  A  admits  a  Cramer-Loeve 
representation  0.4(0;)  [7]  such  that: 

+oo 

A(t)=  I  eiutd@A(u>)  (1) 

—  OO 

We  note  mA  (t)  and  RA  (t,  r)  the  mean  and  autocorrelation 
function  of  A  given  by: 

mA(t)  =f?[4(f)]  (2) 

RA(t,T)  =  E[A(t  +  T/2)A*(t-r/2)]  (3) 

The  power  spectrum  of  A,  SA  <(w),  is  defined  by: 

+oo 

RA(t,T)=  J  eiurdSAt(uj)  (4) 

—  OO 

A  is  said  to  be  stationary  if  and  only  if  mA(t)  and  RA(t,  r) 
are  independent  of  t.  dSA  t(u>)  is  then  independent  of  t. 

A  is  said  to  be  cyclostationary  if  and  only  if  mA(t) 
and  RA(t,r)  are  periodic  in  t  of  period  T  =  27t/u;o  [8]. 
dSA  t  (w)  is  then  periodic  in  t.  We  suppose  that  it  admits  the 
Fourier  series  decomposition  such  that: 

+  CO 

dSAt{v)  =  £  eau)otdSlA(u)  (5) 

/=  — oo 

2.2.  Linear  time-invariant  and  periodic  time-varying  fil¬ 
ters 

Let  h  be  a  linear  time-varying  filter  of  frequency  response 
ht  (w).  Its  response  to  the  stationary  process  Z  is  the  process 
X  defined  by: 

+oo 

X(t)=  J  eiutht(uj)dez(uj)  (6) 
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h  is  a  linear  time-invariant  filter  if  and  only  if  ht{uj)  is  inde¬ 
pendent  of  the  time. 

h  is  a  linear  periodic  time-varying  filter  if  and  only  if 
)  is  periodic  in  time  of  period  T  [1].  We  suppose  that 
it  admits  the  Fourier  series  decomposition  such  that: 

-f  OO 

Mw)  =  £  eilwoth!(u)  (7) 

/=— OO 

2.3.  Linear  stationary  and  cyclostationary  filters 

The  linear  random  time-varying  filter  is  a  generalization  of 
the  linear  time-varying  filter  previously  defined  [9].  Let 
{i7w}w6 r  be  a  complex  random  processes  family,  where, 
for  any  u>,  Hu  =  {Ht(cj),  t  €  R}  is  a  complex  continuous 
random  process.  We  note  Xt(w)  the  mean  and  <£>*,,- (w,  7) 
the  intercorrelation  function  of  the  {Hu}ue r.  Xt(^>)  and 
¥>t,r(w,  7)  are  given  by: 

XM  =  E[Ht(u)]  (8) 

<PtAu,  7)  =  E[Ht+r  («  +  1 )H*_  f(u;  -  |)]  (9) 

Let  h  be  a  linear  random  filter  of  frequency  response  Ht(uj). 
Its  response  to  the  stationary  process  Z  is  the  process  X 
defined  by: 


+  00 

X(t)=  I  eiwtHt(ij)dQz(u >)  (10) 

—  OO 


Thus,  each  linear  filter  can  be  seen  as  a  particular  case  of  lin¬ 
ear  random  filter,  where  H w  is  a  degenerated  random  vari¬ 
able. 

A  linear  random  filter  h  is  said  to  be  stationary  if  and 
only  if  the  processes  are  jointly  stationary.  It 

means  that  the  mean  and  the  intercorrelation  function  of  the 
{H“}  wSR.  are  independent  of  the  time. 

Recently,  the  authors  have  generalized  this  definition 
[6].  We  call  h  a  linear  cyclostationary  filter  if  and  only  if 
the  processes  {F“}u€ r  are  jointly  cyclostationary.  It  cor¬ 
responds  to  the  case  where  the  mean  and  the  intercorrelation 
function  of  the  {#w}u;eR  are  periodic  in  time  of  period  T. 

3.  CLOCK  CHANGE 
3.1.  Periodic  clock  change 

The  response  X  of  a  stationary  process  Z  subjected  to  a 
periodic  clock  change  [3]  h  is  defined  by: 

X(t)  =  g(t)Z[t-f(t)]  (11) 

where  /(£)  and  g(t)  are  real  measurable  functions,  T  = 
27r/u>o  periodic.  In  equation  (1 1),  f(t)  is  a  timing  jitter  and 


g(t)  corresponds  to  an  amplitude  modulation.  It  is  easy  to 
see  that  a  periodic  clock  change  is  a  particular  case  of  linear 
periodic  filter  and  that  its  frequency  response  is  given  by: 

Mw)  -  9{t)e~iuS(t)  (12) 

Periodic  clock  changes  can  be  implemented  easily.  They 
appear  also  often  in  spread  spectrum  applications  that  use 
linear  periodic  filters,  such  as  scrambling  [3]  and  multi-user 
access  [4]. 

3.2.  Reconstruction  of  the  input  signal 

Figure  1  depicts  the  reconstruction  chain  of  a  signal  submit¬ 
ted  to  a  periodic  clock  change. 


Z(t)  _ 

information 


■^Periodic  Clock  Change 


X(t)=g(t)Z(t-f(t) 

observation 


Reconstruction 


Figure  1 :  Reconstruction  chain  of  a  signal  submitted  to  a 
periodic  clock  change 

The  reconstruction  of  a  process  subjected  to  a  periodic 
clock  change  is  a  particular  case  of  reconstruction  of  a  pro¬ 
cess  subjected  to  a  linear  periodic  filter.  Equations  (6)  and 
(7)  show  that  the  response  X  of  the  stationary  process  Z 
subjected  to  a  linear  periodic  filter  h  admits  the  following 
spectral  representation: 

+oo 

dQx(v)=  tl>k{w  -  ku>0)dQz{u  ~  ku>0)  (13) 

k=— oo 

When  the  spectral  support  of  Z  is  included  in  [ — oz0/2,  +’o/2[, 
Z  can  then  be  reconstructed  by: 

Vw€[— wo/2,wo/2[,  VfcgA,  d©z(a>)=i/i~1(w)rf©x(w+fcwo) 

where  A  is  the  integer  set  such  that  the  functions  {rpk  (+0  }  fceA 
are  different  from  zero  on  the  spectral  support  of  Z.  Multi¬ 
ple  redundant  reconstructions  of  Z  can  also  be  obtained  by 
a  frequency  downconversion  followed  by  a  lowpass  filtering 
on  [— <+>0/2,  uz0/2[. 


3.3.  Modulated  periodic  clock  change 

The  paper  proposes  a  new  clock  change  scheme  that  permits 
to  transmit  simultaneously  an  analog  and  a  digital  informa¬ 
tion.  This  spread  spectrum  technique  is  a  generalization  of 
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the  classic  periodic  clock  change.  It  can  be  useful  for  exam¬ 
ple  to  scramble  video  with  analog  image  and  digital  sound. 

It  is  called  modulated  periodic  clock  change. 

The  response  X  of  a  stationary  process  Z  subjected  to 
such  a  clock  change  h  is  defined  by: 

X{t)=g(t)Z[t-M(t)f(t)}  (15) 

where /(f)  and  g{t)  are  defined  as  in  (11)  and  M  =  {M(f),  t  € 
R}  is  a  stationary  process  independent  of  Z.  Figure  2  de¬ 
picts  the  obtained  transmission  chain. 


Z(t)  . 

analog 

information 

M(t) 

digital 

information 


Figure  2:  Transmission  chain  of  a  signal  submitted  to  a 
modulated  periodic  clock  change 

It  is  easy  to  see  that  Z  is  then  subjected  to  a  cyclosta¬ 
tionary  filter  of  frequency  response  given  by: 

Ht(u)  =  g(t)e-iuMWfW  (16) 

In  general,  the  reconstruction  of  Z(t)  can  be  obtained  by  a 
sub-optimal  solution  [6].  Nevertheless,  perfect  reconstruc¬ 
tion  is  possible  when  M  is  a  Bernoulli  variable  that  is  equal 
to  -1  or  +1. 

In  this  case,  equation  (14)  becomes: 

Vug[-w0/2,wo/2[,  Vfce A,  d0z(w)=^1(Mw)(i0x(u+tuo)  (^) 

Let  k\  and  k2  be  two  values  of  k.  Equation  (17)  implies 
that: 

VwG[— wo/2,wo/2[,  ip^*(Mu)d&x  (u+kiu>o)=i’^1(,Mui)d@x(,t^+k2wo) 

(18) 

This  equality  allows  the  identification  of  M  whenever  tpki  (a>) 
and  ipk2(u})  are  not  simultaneously  even  functions.  Know¬ 
ing  M,  Z(t)  can  be  perfectly  reconstructed  using  (17). 

This  method  can  then  be  used  for  any  binary  signal  M(f) 
whose  sampling  rate  is  much  larger  than  T.  It  could  be  also 
generalized  to  any  digital  signal  M(f). 

4.  APPLICATION 

4,1.  Simultaneous  transmission  of  an  analog  and  a  bi¬ 
nary  information 

In  the  following  simulations,  a  modulated  periodic  clock 
change  is  used  to  transmit  simultaneously  an  analog  signal 


Z(t)  band-limited  on  [— w0/2,  oj0 / 2 [  and  an  N.R.Z.  signal 
M(t).  f(t)  and  g{t)  are  given  by: 

/(t)  =  -ctsinwot  and  g(t)  =  1  (19) 

Figure  3  depicts  the  analog  signal  at  input  of  the  clock  change. 


Figure  3:  Initial  analog  signal 
The  binary  signal  is  presented  by  Figure  4. 
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Figure  4:  Initial  binary  signal 

The  signal  observed  at  the  output  of  the  clock  change  is 
represented  in  Figure  5  for  a  =  0.104,  T  =  0.0347 ms  and 
a  bit  rate  of  1  kb/s. 


Figure  5:  Observed  signal 


4.2.  Reconstruction  of  the  analog  information 

We  have  seen  that  Z(t)  has  to  be  reconstructed  while  M (t) 
is  constant.  As  M  (t)  is  a  binary  signal,  the  reconstruction 


Clock  change 
with  modulated 
periodic  function 


*  X(t)=g(t)Z(t-M(t)f(t)) 
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functions  of  Z(t)  are  given  during  each  bit  length  by: 

ipk{Muj)  =  Jk  (Maw)  (20) 

where  Jk  (w)  is  the  k’th  order  Bessel  function  and  M  is  the 
value  of  M(t)  that  is  equal  to  +1  or  —1.  The  reconstruction 
of  Z  ( t )  does  not  depend  of  M  when  k  is  even.  It  can  then 
be  obtained  directly  around  any  even  k.  Figure  6  compares 
the  initial  signal  to  the  reconstruction  obtained  for  k  =  0. 
The  analog  information  is  well  reconstructed. 


4.3.  Reconstruction  of  the  binary  information 

As  we  know  a  correct  reconstruction  of  Z(t)  for  k  even,  the 
reconstructions  obtained  for  k  odd  will  allow  to  know  when 
M(t)  is  correctly  identified.  Figure  7  and  8  compare  the 
initial  signal  to  the  reconstruction  for  k  =  1,  when  M(t)  is 
supposed  always  equal  to  +1  and  when  M(t)  is  correctly 
identified. 


Reconstruction 
of  Z(t)  for  k  even 


Observed  signal 
X(t)=g(t)Z(t-M(t)f(t))  1 


Reconstruction 
of  Z(t)  for  k  odd 
with  M=1 


(Reconstruction 
of  Z(t)  for  k  odd 
with  M=-l 


*  Estimation  of  Z(t) 


Decision 
over  a  bit  period 


"  Estimation  of  M(t) 


Figure  9:  Scheme  for  the  estimation  of  Z(t)  and  M(t) 


proposed  a  reconstruction  method  of  the  signals  transmitted 
by  this  filter.  It  was  applied  successfully  to  the  simultaneous 
transmission  of  an  analog  and  a  binary  signal. 


The  block  diagram  of  Figure  9  shows  a  scheme  witch 
allows  to  reconstruct  Z(t)  and  to  recover  the  values  of  M(t) 
assuming  perfect  timing  of  the  corresponding  bit  stream. 

5.  CONCLUSION 

In  this  paper,  we  recalled  the  definition  of  a  linear  periodic 
filter  and  of  a  linear  cyclostationary  filter.  We  presented 
a  new  filter  called  modulated  periodic  clock  change.  We 
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ABSTRACT 

We  consider  the  problem  of  equalization  of  the  frequency 
selective  mobile  radio  channel  in  the  presence  of  co-channel 
interference  (CCI).  Conventional  trellis  equalizers  treat  the 
sum  of  noise  and  interference  as  additive  white  Gaussian 
noise,  while  CCI  is  generally  a  colored  non-Gaussian  process. 
We  propose  a  non-parametric  approach  based  on  the  esti¬ 
mation  of  the  probability  density  function  of  the  noise-plus- 
interference.  Given  the  availability  of  a  limited  volume  of 
data,  the  density  is  estimated  by  kernel  smoothing  tech¬ 
niques.  Due  to  the  temporal  color  of  the  CCI,  the  use  of 
a  whitening  filter  is  also  addressed.  Simulation  results  are 
given  for  the  GSM  system,  showing  a  significant  perfor¬ 
mance  improvement  with  respect  to  the  equalizer  based  on 
the  Gaussian  assumption. 

1.  INTRODUCTION 

Time-division  multiple  access  (TDMA)  mobile  radio  sys¬ 
tems  like  GSM  are  affected  by  co-channel  interference  (CCI) 
and  intersymbol  interference  (ISI)  due  to  multipath  propa¬ 
gation.  Channel  equalizers  commonly  employed  in  practi¬ 
cal  GSM  receivers  perform  maximum  likelihood  (ML)  [1]  or 
maximum  a  posteriori  probability  (MAP)  [3]  data  estima¬ 
tion  on  the  ISI  trellis.  ML  sequence  estimation  using  the 
Viterbi  algorithm  [2]  is  well  known  as  the  optimum  detec¬ 
tion  technique  for  signals  corrupted  by  finite-length  ISI  and 
additive  white  Gaussian  noise  (AWGN),  in  the  sense  that  it 
minimizes  the  probability  of  a  sequence  error.  The  symbol- 
by-symbol  MAP  algorithm,  proposed  over  two  decades  ago 
by  Bahl  et  al.  [3]  for  decoding  of  convolutional  codes,  has 
recently  received  renewed  interest  as  a  soft-in/soft  out  de¬ 
coder  for  iterative  decoding  of  parallel  or  serially  concate¬ 
nated  codes  [4].  As  a  trellis  equalizer,  the  MAP  algorithm 
is  optimum  in  the  sense  that  it  minimizes  the  probability  of 
symbol  error.  In  receivers  employing  the  concatenation  of 
an  equalizer  and  a  channel  decoder,  the  performance  is  im¬ 
proved  by  soft-decision  decoding  and  iterative  equalization 
and  decoding  [5].  In  this  respect,  the  MAP  algorithm  has 
the  advantage  of  intrinsically  providing  optimal  a  posteriori 
probability  as  a  soft-output  value. 

In  this  paper,  we  consider  the  problem  of  equalization 
of  the  mobile  radio  channel  in  the  case  of  single  channel 
reception.  The  optimum  trellis  equalizer  in  the  presence 


of  ISI,  CCI,  and  AWGN  is  based  on  joint  detection  of  the 
co-channel  signals  [7].  Although  joint  ML  and  joint  MAP 
detection  are  optimal,  they  can  be  prohibitively  expensive 
since  the  complexity  increases  exponentially  with  the  sum 
of  the  channel  lengths  of  the  desired  and  CCI  signals.  In  ad¬ 
dition,  the  estimation  of  the  channel  impulse  response  of  all 
co-channel  signals  requires  the  knowledge  of  the  training  se¬ 
quence  of  each  interferer.  On  the  other  hand,  conventional 
receivers  employ  a  trellis  equalizer  which  treats  the  sum  of 
noise  and  interference  as  additive,  white,  Gaussian  noise. 
In  reality,  the  sum  of  noise  and  CCI  is  generally  a  colored 
non-Gaussian  random  process,  and  the  above  approach  cor¬ 
responds  to  a  degradation  of  the  error  performance. 

In  order  to  correctly  set  the  problem  of  trellis  data  es¬ 
timation,  a  proper  statistical  characterization  of  the  dis¬ 
turbance  is  required.  To  this  purpose,  we  propose  a  non- 
parametric  trellis  equalizer,  based  on  the  estimation  of  the 
probability  density  function  of  the  noise-plus-interference. 
Given  the  limited  volume  of  training  data,  the  work  is  based 
on  the  application  of  density  estimation  by  kernel  smooth¬ 
ing.  The  temporal  color  of  the  CCI  is  taken  into  account 
by  a  whitening  filter. 

2.  MAP  TRELLIS  EQUALIZATION 

2.1.  System  Model 
Consider  the  received  signal 

L  —  l 

rk  =  ^2  bk~ehek)  +  >  (1) 

e=o 

where  bk  €  {+1,-1}  are  the  transmitted  symbols,  the  L 
complex  tap-gains  represent  the  samples  of  the  equiva¬ 
lent  channel  impulse  response  at  time  k,  and  =  y'k  +  wk 
indicates  the  sum  of  co-channel  interference  and  thermal 
noise.  In  the  case  of  the  GSM  system,  we  consider  the  lin¬ 
earized  model  of  the  GMSK  signal  [8],  where  are  the 
taps  of  the  equivalent  discrete-time  channel  produced  by 
derotation  of  the  received  signal  [9].  The  GSM  signal  has  al¬ 
most  zero  excess  bandwidth,  and  we  assume  that  sufficient 
statistics  for  data  estimation  can  be  obtained  by  symbol- 
rate  sampling  at  the  output  of  a  fixed  front-end  filter.  The 
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analysis  can  be  extended  to  include  the  case  of  non-zero  ex¬ 
cess  bandwidth  by  introducing  oversampling  and  fraction¬ 
ally  spaced  trellis  equalization. 

In  this  Section,  we  consider  the  CCI  samples  as  in¬ 
dependent  complex  non-Gaussian  random  variables.  The 
discrete-time  process  y'k  is  generally  colored,  even  if  the 
delay  spread  in  a  typical  interference-limited  environment 
is  usually  relatively  small.  At  high  signal-to-noise-ratios 
(SNRs)  a  suitable  temporal  prewhitening  is  assumed  to 
produce  approximately  independent  non-Gaussian  distur¬ 
bance.  The  validity  of  this  assumption  will  be  discussed  in 
Section  3. 


2.2.  Symbol-by-Symbol  MAP  Algorithm  for  Finite- 
Length  ISI  and  Additive  Independent  Disturbance 

Suppose  that  the  symbols  bk  are  transmitted  in  finite  blocks 
of  length  N.  Assuming  the  knowledge  of  the  channel  im¬ 
pulse  response,  a  soft-output  symbol-by-symbol  MAP  equal¬ 
izer  computes  the  a  posteriori  log-likelihood  ratio 


L(bk\r0, . .  .,rN-i)  =  log 


Pr(6fc  =  +l|rp, . . .  ,rjy_i) 
Pr(6fc  =  — l|r0,...,nv-i) 


(2) 


with  0  <  k  <  N  —  1.  Let  pk  =  (bk-i, . .  ,,bk-L+i)  denote 
the  generic  ISI  state  at  time  k ,  and  S(bk)  the  set  of  states 
corresponding  to  the  transmitted  symbol  bk .  Indicating  by 
fk  the  transition  from  the  state  pk  to  pk+i,  the  MAP  al¬ 
gorithm  results  in  a  forward  and  backward  recursions  with 
the  transition  metric  A(£fc),  coupled  by  a  dual-maxima  op¬ 
eration  [3],  [6] 


L(6fc|r0,...,rjv_i)  =  max'  A(pk+i)-  max'  A(uk+i) 

HES(bif~+l)  ^GS(6fc  =  — 1) 

,  (3) 

A(Atfc+i)  —  A  (fj,k)  —  A(£fc)  +  Ah(/j,k+i)  ,  (4) 

where  A(pk)  is  the  overall  accumulated  metric  for  the  state 
pk,  A *  and  Ab  are  the  accumulated  metrics  in  the  forward 
and  backward  recursions,  and  ma x'{x,y}  =  maxfi,  y}  + 
log(l  +e-'*-yl)  [6].  The  metric  increment  \(fk)  results 


■*(&)  =  -logp(rk\bk,...,bk-L+i)  -logPr(&fc)  ,  (5) 


where  p(rk\bk, ...,  bk-L+1)  =  pn(rk  -  bk-th f )).  In 

the  case  where  nk  is  modelled  as  AWGN,  the  quantity 
—  logp(rfc|6fc, . . . ,  bk-L+i)  in  (5)  produces  the  Euclidean  dis¬ 
tance  metric.  When  no  a  priori  information  is  available 
about  the  transmitted  bit  bk,  the  term  —  log Pr(bk)  in  (5) 
has  no  effect  and  can  be  omitted  from  the  calculation.  On 
the  contrary,  if  the  equalizer  receives  some  a  priori  infor¬ 
mation  the  above  term  has  a  fundamental  role  in  deriving 
a  soft-in/soft-out  MAP  equalizer  [4],  [5]. 

Observe  that  the  above  derivation  relies  on  the  assump¬ 
tion  of  known  channel.  In  practice,  the  channel  response  is 
usually  estimated  using  a  known  training  sequence  at  the 
equalizer  start-up. 


3.  TRELLIS  EQUALIZATION  BY 
NON-PARAMETRIC  DENSITY  ESTIMATION 

3.1.  Density  Estimation  by  Kernel  Smoothing 

An  example  of  the  density  function  of  the  noise  plus  CCI 
samples  nk  for  the  case  of  the  GSM  channel  is  shown  in 


GMSK  signal,  GSM  TU  profile 


mm iokm  w;a-«rri  gni  ptssnr. ,  r»  *t>m,  f*  *  f,  m 


Figure  1:  Example  of  the  density  function  of  CCI  (derotated 
GMSK  signal)  plus  AWGN  for  a  GSM  receiver. 


Figure  1.  The  plot  has  been  obtained  by  a  histogram  of  the 
data  in  2000  bursts,  considering  one  dominant  interferer 
under  stationary  propagation  conditions.  From  Figure  1, 
it  is  apparent  that  the  disturbance  can  not  be  realistically 
modelled  as  a  Gaussian  random  variable. 

3.1.1.  Parzen  Estimator 

An  estimate  of  the  probability  density  function  of  a  com¬ 
plex  random  variable  X  can  be  built  from  a  set  of  data 
Xi,  i  =  1 , ...  ,n,  by  means  of  a  smoothing  function  or  ker¬ 
nel  function  K(x,Xi)  (see  [11]  and  references  therein).  In 
the  method  proposed  by  Parzen  [10],  an  estimate  of  the 
unknown  density  is  given  by 

Pn(x)  =  --TK(x1Xi)  .  (6) 

n  <= 1 

A  possible  choice  for  the  function  K(x,Xi)  among  those 
satisfying  the  conditions  for  (asymptotic)  unbiasedness  and 
consistency  of  the  estimator  [10]  is  the  Gaussian  kernel  of 
fixed  width  <to 

K(x,Xi)  =  _L.c-|"-*lW  .  (7) 


3.1.2.  Transition  Metrics  for  N on-Parametric  Trellis 
Equalization 

In  the  case  of  a  Bayesian  trellis  equalizer,  the  random  vari¬ 
able  X  represents  one  realization  of  the  process  of  noise- 
plus-interference  corresponding  to  a  given  received  burst. 
Consider  the  received  signal  (1),  and  assume  that  the  chan¬ 
nel  is  approximately  constant  within  the  burst  duration. 
Then,  once  the  channel  taps  he  are  estimated  using  the 
M  training  symbols  bi}  they  can  be  used  to  derive  the  set 
of  observations  Xi,  i  =  1, . . .  ,n  =  M  -  L  of  the  random 
disturbance  X  according  to  Xi  =  hi  —  r,  —  bi-ehe, 
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Figure  2:  Block  diagram  of  the  non-parametric  trellis  equal¬ 
izer. 


where  hat  denotes  the  estimated  value.  At  this  point  we  re¬ 
call  that  the  transition  metric  (5)  of  the  optimum  symbol- 
by-symbol  MAP  algorithm  results  A(£*,)  =  —log pn(rfc  — 
bk-eht)  -  logPr(6fc).  Therefore,  using  (6)  and  (7) 
one  can  directly  estimate  the  quantity  log  pn('x)  for  x  = 
nk  =rk-  o  bk-ehe,  and  obtain 

A(£fc)  =  -logpn(x)  -  log Pr(6fc)  .  (8) 


and  variance  2cr2,  which  we  assume  independent  of  y'k.  If 
the  co-channel  taps  h'/k>  at  time  k  are  regarded  as  an  un¬ 
known,  but  deterministic  mapping  from  (b'k, . . . ,  b'k_L,+1) 
to  y'k,  the  distribution  of  nk  can  be  derived  from  those  of 
b'k  and  wk  .  Given  a  generic  binary  quantity  /3,  we  define 

L'-l  ( 

Tji  =  T/i,i  +ji]i, 2  —  y;  ,  o  <  i  <  2l  ,  (10) 

£=0 

where  A  =  {Pi.ejeJo1  denotes  one  of  the  2L  distinct  se¬ 
quences  of  elements  A,r  G  {+1,  —1}.  Then,  it  is  possible  to 
show  that  the  expression  of  the  density  of  nk  results 


1 

Pn(x)  = 


(11) 


1=1 


where  pw(x)  is  the  complex  Gaussian  density  with  vari¬ 
ance  2er2.  From  (11),  the  density  of  the  interference-plus- 
noise  is  given  by  a  number  of  symmetric  Gaussian  kernels, 
which  centers  are  the  points  of  the  hypothetical  scatter  dia¬ 
gram  obtained  in  the  absence  of  thermal  noise.  Comparison 
of  (11)  and  (6)  reveals  the  strong  connection  between  the 
structure  of  the  Parzen  estimator  and  the  true  density.  In 
particular,  for  a2  — >  0,  the  observations  X;  in  (6)  corre¬ 
spond  to  the  points  of  the  complex  plane  defined  by  (10), 
with  the  binary  parameters  A,/  replaced  by  the  co-channel 
symbols  b'k_e.  Therefore,  the  estimator  defined  by  (6)  and 
(7)  will  approach  the  true  density  (11)  as  soon  as  the  di¬ 
mension  of  the  training  data  is  large  enough  to  represent 
the  2l  equiprobable  sequences  A  =  (A.tlfco1- 


The  block  diagram  of  the  resulting  equalizer  is  shown  in 
Figure  2.  From  the  implementation  point  of  view,  the  den¬ 
sity  logPn(*)  at  time  k  can  be  computed  separately  for  each 
trellis  branch.  Alternatively,  it  can  be  precomputed  for  a 
finite  number  of  values  x,  and  stored  in  a  look-up-table  be¬ 
fore  starting  the  trellis  processing. 

We  emphasize  the  fact  that  the  above  technique  deals 
with  the  statistical  model  of  a  random  variable,  obtained 
as  the  realization  of  the  noise-plus-interference  process  at  a 
given  time  instant.  It  is  worth  noting  that,  with  a  proper 
adaptive  procedure,  the  approach  can  be  extended  to  those 
cases  where  the  CCI  impulse  response  cannot  be  considered 
approximately  constant  within  the  burst. 

3.2.  Probability  Density  Function  of  the  Noise-plus- 
interference 

The  analytical  expression  of  the  actual  density  function  of 
noise-plus-interference  can  be  carried  out  if  we  assume  a 
(unknown)  deterministic  finite-state  machine  model  for  the 
co-channel  signal.  Consider  the  received  signal  (1).  The 
sum  of  noise  and  CCI  at  time  k  can  be  expressed  as 

L'- 1 

nk  =  y'k  +  Wfc  =  ^2  b'k-ttil{k)  -(-  wk  ,  (9) 

«=o 

where  b'k  G  {+1,-1}  are  the  co-channel  symbols,  h'/k\ 
0  <  £  <  I/  —  1  denote  the  taps  of  the  co-channel  impulse 
response,  and  wk  is  white  Gaussian  noise  with  zero  mean 


3.3.  Doubling  the  Size  of  the  Training  Set 

We  observe  that  in  (10)  for  each  index  i  =  i'  corresponding 
to  the  binary  sequence  A'  =  {A'.iJJLo1  there  is  an  index 
i  =  i"  with  A"  =  {— A'.f}^1  =  -A'-  This  means  that 
for  each  i'  there  is  an  i"  such  that  r;p  =  —T\i» .  Exchanging 
each  pair  of  indexes  i'  and  i"  in  the  sum  (11)  and  taking 
into  account  the  symmetry  of  the  Gaussian  density  pw(x) 

gives  p„(-x)  =  (1/2L' pw(-x+T]i)  =pn(x).  The  im¬ 
portance  of  this  result  comes  from  the  fact  that  it  allows  to 
double  the  available  volume  of  data  in  the  density  estimator 
(6).  In  fact  it  implies  that,  if  { Xi }  are  values  assumed  by 
the  random  variable  nk,  then  the  set  {—X,}  contains  val¬ 
ues  assumed  by  nk  with  the  same  probability.  Therefore, 
together  with  each  outcome  Xi  we  can  additionally  consider 
— Xi  as  if  it  was  the  result  of  a  parallel  experiment.  This 
leads  to  the  enlarged  data  set  {Xi,  —Xi}. 

3.4.  Choice  of  the  Smoothing  Parameter 

An  optimal  kernel  width  for  the  fixed-width  density  estima¬ 
tor  (6)  can  be  determined  through  the  minimization  of  the 
mean  integrated  square  error  (MISE)  [11].  In  the  case  of  the 
Gaussian  kernel  (7)  used  to  estimate  the  complex  Gaussian 
density  with  variance  2<r2,  we  have  cr0(opt)  =  (l/n)1/6cr  [11]. 
For  the  density  of  the  noise-plus-interference,  using  (11)  and 
applying  Cauchy’s  inequality  we  find 

O-O(opt)  >  (l/n)1/6cr  .  (12) 
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Figure  3:  Error  performance  in  the  case  of  known  channel. 
GSM  TUO  profile,  SNR  =  30  dB.  Density  estimator  with 
fixed  kernel  width  cro  =  0.05. 


With  a  given  volume  n  of  training  data,  the  kernel  width 
can  then  be  selected  from  the  value  of  the  noise  variance 
a2 .  In  a  practical  receiver,  an  estimate  of  <r2  can  be  derived 
by  the  training  sequence,  taking  into  account  the  estimated 
channel  response  and  the  measure  of  the  received  signal 
level. 

3.5.  Temporal  Whitening 

The  MAP  equalizer  with  branch  metric  (8)  is  based  on  the 
assumption  that  the  samples  rik  are  independent.  Given  the 
temporal  color  of  the  CCI,  a  whitening  filter  of  the  distur¬ 
bance  is  needed  before  the  trellis  processor.  We  point  out 
that  a  linear  prediction-error  (LPE)  filter  will  ideally  pro¬ 
duce  uncorrelated  CCI-plus-noise  samples,  but  this  does  not 
necessarily  imply  independence,  since  the  process  continues 
in  general  to  be  non-Gaussian.  In  addition,  a  whitening  fil¬ 
ter  for  the  disturbance  will  inevitably  increase  the  channel 
memory  for  the  desired  signal.  And  if  we  do  not  want  to 
increase  the  number  of  states  of  the  equalizer,  the  number 
of  taps  of  the  filter  has  to  be  kept  small.  However,  the 
delay  spread  of  the  typical  GSM  urban  channel  is  usually 
lower  than  4  symbol  intervals.  Moreover,  reducing  the  cor¬ 
relation  between  the  samples  will  certainly  reduce  their  ’de¬ 
pendence’.  Note  that  in  some  particular  cases  the  whitened 
disturbance  turns  out  to  be  actually  independent.  As  an  ex¬ 
ample,  this  happens  when  the  variance  of  the  thermal  noise 
tends  to  zero  and  the  co-channel  is  minimum-phase  (in  fact, 
in  this  case  the  ideal  LPE  filter  inverts  the  co-channel). 

4.  SIMULATION  RESULTS 

The  effectiveness  of  the  strategy  based  on  density  estima¬ 
tion  by  kernel  smoothing  has  been  assessed  by  computer 
simulation  for  the  case  of  a  GSM  receiver  with  single  chan¬ 
nel  reception.  The  GMSK  transmitted  symbols  are  ob¬ 


Figure  4:  Error  performance  in  the  case  of  estimated  chan¬ 
nel.  GSM  TUO  profile,  SNR  =  30  dB.  Density  estimator 
with  fixed  kernel  width  a o  =  0.05. 


tained  from  the  source  bits  by  rate  1/2  convolutional  en¬ 
coding  and  interleaving,  according  to  the  GSM  specifica¬ 
tions  for  the  full-rate  speech  traffic  channel.  The  simula¬ 
tor  includes  the  multipath  fading  channel  with  the  classical 
Doppler  spectrum  [14],  CCI,  and  thermal  noise.  Ideal  fre¬ 
quency  hopping  is  implemented.  One  dominant  co-channel 
interferer  is  assumed,  characterized  by  an  independent  fad¬ 
ing  process  and  a  random  phase  shift  with  respect  to  the 
signal  of  interest.  In  all  the  simulations  SNR  =  30  dB.  At 
the  receiver,  the  soft-output  data  produced  by  a  16-states 
MAP  equalizer  are  deinterleaved  and  decoded  by  a  convo¬ 
lutional  channel  decoder. 

To  establish  the  ultimate  performance  of  the  proposed 
equalizer,  we  first  consider  the  ideal  case  of  known  chan¬ 
nel  and  relative  speed  0  Km/h.  Figure  3  shows  the  bit¬ 
error  rate  (BER)  performance  with  GSM  typical  urban  area 
(TU)  multipath  profile  for  both  co-channel  signals.  The 
MAP  non-parametric  equalizer  is  compared  with  the  MAP 
trellis  processor  that  assumes  Gaussian  disturbance.  The 
figure  also  addresses  the  effect  of  doubling  the  data  set  for 
density  estimation,  as  discussed  in  Section  3.  The  results 
indicate  that  the  non-parametric  equalizer  offers  a  poten¬ 
tial  improvement  of  more  than  two  orders  of  magnitude  in 
terms  of  BER  at  the  equalizer  output.  Figures  4  to  6  il¬ 
lustrate  the  receiver  performance  when  the  channel  of  the 
signal  of  interest  is  estimated  from  the  training  symbols. 
We  also  introduce  an  LPE  filter  for  prewhitening  of  the  col¬ 
ored  disturbance.  As  discussed  in  Section  3,  choosing  the 
prediction  order  involves  a  trade-off  between  performance 
and  complexity.  In  the  figures,  we  use  a  16-states  trellis  and 
a  2-taps  LPE  filter.  Finally,  we  include  the  performance  ob¬ 
tained  by  iterative  channel  estimation.  In  this  case,  after 
the  equalization  of  the  entire  burst,  the  data  decisions  are 
fed  back  to  produce  an  improved  channel  estimate,  which 
is  used  in  a  second  pass  equalization. 

The  above  simulation  results  refer  to  a  synchronous 
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Figure  5:  Error  performance  with  iterative  channel  estima¬ 
tion.  GSM  TUO  profile,  SNR  =  30  dB.  Density  estimator 
with  fixed  kernel  width  <7o  =  0.05. 

interference  scenario.  Simulation  with  asynchronous  CCI 
shows  that  the  proposed  equalizer  still  outperforms  the  con¬ 
ventional  trellis  processor.  However,  in  those  cases  the 
proper  approach  consists  in  introducing  an  adaptation  of 
the  estimated  density  of  the  noise-plus-CCI. 

5.  CONCLUSIONS 

A  non-parametric  trellis  processor  has  been  studied  for  chan¬ 
nel  equalization  in  the  presence  of  non-Gaussian  interfer¬ 
ence.  In  the  case  of  the  GSM  system,  the  proposed  ap¬ 
proach  based  on  density  estimation  by  kernel  smoothing 
provides  a  significant  performance  improvement  with  re¬ 
spect  to  the  receiver  that  assumes  Gaussian  disturbance. 
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ABSTRACT 

In  this  paper,  a  novel  analytical  blind  identification  al¬ 
gorithm  is  presented,  based  on  the  non-circular  second- 
order  statistics  of  the  output.  It  is  shown  that  the 
channel  taps  need  to  satisfy  a  polynomial  system  of 
degree  2,  and  that  identification  amounts  to  solving 
the  system.  We  describe  the  algorithm  able  to  solve 
this  particular  system  entirely  analytically.  Computer 
results  demonstrate  its  efficiency. 

1.  INTRODUCTION 

Blind  identification  methods  depend  on  the  characteris¬ 
tics  of  the  input  sources.  For  example,  it  is  known  that 
a  system  can  only  be  identified  up  to  an  all-pass  filter 
when  its  input  is  Gaussian  circular.  Consequently,  a 
particular  attention  has  been  paid  to  the  non-Gaussian 
input  cases.  In  those  situations  the  phase  information 
can  be  accessed  using  high-order  statistics  of  the  obser¬ 
vations,  and  in  the  SISO  case  the  system  is  identified 
up  to  a  scalar  factor  only.  This  has  been  studied  in 
numerous  papers  among  which  one  can  cite  the  works 
of  Shalvi- Weinstein  [5],  Tugnait  [7], 

An  interesting  class  of  non-Gaussian  signals  is  the 
discrete  one,  which  appears  in  wireless  communica¬ 
tions.  The  discrete  character  has  been  used  by  few 
authors  such  as  Li  [3]  or  Yellin  and  Porat  [8] ,  who  were 
the  first  interested  in  an  algebraic  solution.  The  stud¬ 
ied  signals  have  also  non  zero  cyclostationary  statistics, 
which  allows  identification  using  second-order  statistics 
only  [4]  [6]. 

The  novelty  of  our  contribution  is  two-fold.  First, 
non-circular  second-order  moments  are  used.  Second, 
an  algebraic  solution  to  a  class  of  polynomial  systems, 
constructed  from  a  block  of  data,  is  introduced.  Our 
approach  is  described  in  the  case  of  MSK  modulations, 
approximating  well  the  digital  modulation  utilized  in 
the  GSM  standard.  In  addition,  block  methods  are 
well  matched  to  burst-mode  communication  systems. 


2.  MODEL,  NOTATION,  AND 
ASSSUMPTIONS 

Assume  a  finite  sequence  of  input  samples  x(m)  is  fed 
into  a  Finite  Impulse  Response  (FIR)  linear  system  of 
length  M.  Denote  y(n)  the  corresponding  output  se¬ 
quence  of  length  N,  satisfying: 

M  — 1 

y(n )  =  ^2  h(m)  x(n  —  m)  +  w(n)  =f  x(n;  M)Th+ie(n) 

m= 0 

Multidimensional  variables  are  stored  in  column  vec¬ 
tors  and  denoted  by  boldface  letters;  for  instance, 
x(n;  M)  =  [x(n), . .  ,x(n  —  M  +  1)]T,  by  construction. 

The  input  sequence  is  assumed  to  follow  a  discrete 
distribution,  stemming  from  BPSK,  MSK,  or  QPSK 
digital  modulations,  and  the  channel  h  is  supposed 
time-invariant  during  the  observation. 

The  key  statistical  property  used  in  this  paper  is 
that  discrete  signals  are  non-stationary  at  given  orders. 
More  precisely,  for  BPSK  modulated  signals  : 

E{x(n)x(n  -  ^)|a:(0)}  =  a:(0)2<5(^) 

E{x(rc)ar(n  -  £)*}  =  S(£) 

for  MSK  signals  : 

E{x{n)x(n  —  f)|a:(0)}  =  (— l)”ar  (0)  2<5(^) 

E{x(n)x{n  -  ^)*|x(0)}  =  J(€) 

and  for  QPSK  modulated  signals: 

E{f?e  [z(n)]  Re  [x(n  -  £)]  |a;(0)}=Re  [z(0)]2  S(£) 
E{Im  [z(n)]  Im  [ x(n  -  £)]  |z(0 )}=Im  [z(0)]2  S(£) 
E{a:(n)a:(n  -  k)x(n  -  £)x{n  -  m)|x(0)}=a:(0)4(J(fc  +  £  +  m) 
E{x(n)x(n  -  £)*}=6(£), 

and  where  S(£)  =  1  if  £  =  0  and  S(£)  =  0  elsewhere. 
Note  the  conditional  expectation,  exhibiting  cyclosta- 
tionarity  in  the  non-circular  moment  of  MSK  inputs. 
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3.2.  Preliminaries 


Based  on  these  properties,  it  is  possible  to  derive 
a  set  of  polynomial  equations  that  the  channel  must 
satisfy.  In  the  MSK  case,  we  obtain  : 

M- 1 

E  [y(n)y(n  —  f)|a:(0)]  =  x(0)2  ^  (~l)m  h(m)h(m+£) 

m= 0 

In  the  BPSK  case,  we  have  : 

M— 1 

E[y{n)y{n  —  £)|ar(0)]  =  z(0)2  ^  h(m)h(m  +  £) 

m=0 

lastly  in  the  QPSK  case  : 

E[y{n)y{n  -  h)y(n  -  £2)y(n  -  £3)\x( 0)]  = 

a:(0)4  YjmZo  h(m)h(m  +  £i)h(m  +  £2)h(m  +  £3) 

3.  SOLVING  THE  POLYNOMIAL  SYSTEM 
3.1.  Example 

In  order  to  introduce  in  easy  words  our  contribution, 
let’s  give  a  simple  example.  Let  the  input  signal  be 
MSK  and  the  channel  be  real  of  length  M  =  3.  Then 
non  circular  statistics  yield: 

h(0)2  —  h(l)2  +  h(2)2  =  h 
MO)ft(l)  -  h(l)h(2)  =  f2  (1) 

h(Q)h(2)  =  /3 

whereas  circular  ones  yield: 

/i(0)2  +  /i(l)2  +  h(2)2  =  91 

h(0)h(l)  +  h(l)h(2)  =  g2 

h(0)h(2)  =  f3 

where  /,■  and  g ,  are  given  (they  depend  on  statistics 
of  observations  y).  The  grouping  of  those  equations 
allows  to  obtain: 

h(0)2  +  h(2)2  =  (/1  +5i)/2 

h(0)h(l)  =  (f2  +  g2)f  2 

h(0)h(2)  =  f3 

Using  the  first  and  third  equations,  one  gets: 

(/i(0)  -  «'/i(2))2  =  h(0)2  +  h(2)2  —  2ih(0)h(2) 

=  (/1  +^i)/2  -  2i/3 

This  equation  eventually  allows  to  calculate  h(0)  and 
h( 2)  up  to  a  sign,  and  then  h(l). 

Thus  we  have  been  able  to  identify  a  real  channel 
by  using  the  non-circular  second  order  statistics  to¬ 
gether  with  circular  second  order  ones.  The  general 
algorithm  that  is  described  in  this  section  computes  the 
finite  set  of  solutions  of  the  polynomial  system  built  on 
the  non-circular  second-order  statistics  only.  In  the 
next  section,  the  choice  of  the  channel  estimation  is 
discussed. 


Consider  the  ring  H  =  €  [£]  of  polynomials  in  variables 
£  d=  [h(0),  h(l), ...  h(M  —  1)]  with  coefficients  in  the 
complex  field  C ;  the  dual  space  of  TZ  is  the  set  of  linear 
forms  from  H  to  € ,  denoted  'll.  The  evaluation  of  a 
polynomial  p  at  a  point  (  £  CM ,  denoted  by  If  :pi-> 
p(C),  is  the  linear  form  which  we  are  most  interested 
in. 

Given  a  polynomial  a  G  A,  define  the  multiplication 
operator  by  a  as  the  mapping  Ma  that  associates  q 
with  aq  : 

Ma-A  ->  A  (2) 

q  i-4  q  a 

The  transposed  operator,  M\,  is  by  definition 
the  mapping  from  A  onto  itself  so  that  ( q ,  Mj A)  = 

(Maq,  A)  =  (aq,  A),  VA  £  A,  Vg  G  11  so  that 

Ml(A)(q)  =A(qa). 

3.3.  Lemmas 

Let  V  be  the  subset  H  of  polynomials  {/1 , . . .  ,  /at}  of 
degree  D  and  belonging  to  11 .  Bezout’s  theorem  [2, 
p.227]  states  that  such  a  system 

V  :  {/m(*)  =  0,  l<m<M}.  (3) 

where  £  =f  [£(0),£(2),  ...£(M—  1)],  has  an  infinity  of 
solutions,  or  a  number  of  solutions  smaller  or  equal  to 
Dm. 

When  the  system  has  a  finite  number  of  solutions, 
one  conventional  way  to  compute  them  is  to  reduce  the 
problem  to  an  eigenvector  computation,  as  shown  by 
the  following  lemma. 

Lemma  3.1  Linear  forms  1$  :  p  1-4  p(£),  where  £  is 
any  solution  ofV,  are  the  eigenvectors  of  all  matrices 
(Ma)azA'  The  corresponding  eigenvalues  are  a(£). 

For  a  proof  see  [1] . 

Therefore,  the  computation  of  the  multipli¬ 
cation  matrix  Ma  appears  as  a  key  step  in  the 
proposed  algorithm,  since  its  eigen  vectors  allows 
to  find  the  solutions  of  V.  Indeed,  if  we  take  for  a 
basis  of  A,  B  =  {1, h(0), h(l),  •  •  •  , h(0)h(l),  •  •  •}, 
the  entries  of  the  eigenvectors  are  equal  to 
{l.£(0),£(l),  •  •  •  ,£(0)£(1),  •  •  •},  where  £  stands 
for  any  possible  solutions  of  V. 

3.4.  Computation  of  matrix  Ma 

Matrix  Ma  can  be  directly  computed  from  the 
Macaulay  matrix  associated  with  polynomials 
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{/i,...,/m}  [1].  These  matrices  are  the  exten¬ 
sion  of  the  so-called  Sylvester  matrices  to  multivariate 
polynomials. 

However,  if  we  take  into  account  the  relationships 
between  the  monomials  introduced  in  the  polynomial 
system  V,  there  exists  a  much  simpler  procedure  to 
compute  Ma . 

In  order  to  simplify  the  discussion,  we  will  use  the 
following  example.  Suppose  that  a  channel  of  length 
M  =  3  is  excited  by  a  MSK  input.  System  V  is  then 
equal  to  that  in  (1).  In  this  case,  a  generic  basis  that 
can  solve  this  kind  of  system  is  given  by  : 


Among  these  monomials,  some  are  already  in  the 
basis.  In  our  example,  the  monomials  h{ 0),  h(0)h(l), 
h(0)/i(2)  et  h(0)h(l)/i(2)  are  in  the  basis.  The 
others  monomials,  h( 0)2,  h(0)2h(l),  h(0)2/i(2)  and 
h(0)2/i(l)h(2),  have  to  be  expressed  using  the  polyno¬ 
mial  system. 

According  to  equation  (4),  monomials  h( 0)2,  ft  ( 1 ) 2 
and  h( 2) 2  can  be  expressed  directly  as  a  function  of 
1,  h(0),  h(l),  h(2),  h{0)h{l),  h{0)h{2)  and  h{l)h{2), 
provided  that  T  is  chosen  correctly.  In  other  words, 
monomials  h(0)2,  /i(l)2  and  h( 2)2  can  be  expressed  di¬ 
rectly  as  a  function  of  the  basis  using  equation  (4) . 


b3  =  {i,h(0),Mi),M2),M0)Mi), 

h(0)h{2),h{l)h{2),h{0)h{l)h{2)} 


However,  this  basis  cannot  be  used  in  our  problem 
unless  we  first  apply  a  change  in  the  variables.  Thus, 
the  computation  of  the  multiplication  matrix  can  be 
split  into  4  steps. 

First  step  :  Change  in  variables 

Suppose  we  use  the  following  change  in  variables  : 
£  =  Th  then  the  system  in  £  becomes  : 


AU 


(4) 


where  the  entries  of  A  are  functions  of  the  entries  of  T, 
and  : 


n  =  [1,  />(0),  h(l),  h(2),  /i(0)/i(l), 
h(0)h(2),  h{ l)/i(2),  h(0)2,  /i(l)2,  h( 2)] 

.  The  matrix  T  must  be  chosen  so  that  monomials 
h( 0)2,  h(l)2  and  A ( 2 ) 2  can  be  directly  expressed  as 
functions  of  the  basis  (see  the  next  step) . 

Second  step  :  Expression  of  the  second  degree 
monomials 

Suppose  we  want  to  find  the  matrix  associated  with 
the  multiplication  by  h(0).  The  monomials  that  we 
have  to  express  are  : 


Monomials  of  the  basis 


1 

m 

Mi) 

M2) 

h(0)h{l) 

h(0)h(2) 

h{l)h(2) 

M0)Mi)M2) 


Monomials  to  be  expressed 


xh(0) 


Mo) 

Mo)2 

h(o)Mi) 

h{0)h{2) 
h(0)2h(l) 
h(0)2h{2) 
h(0)h(l)h(2) 
h(0)2  h(l)h(2) 


r  mo)2  i 

Ml)2 

L  M2)2  J 


=  B 


1 

Mo) 

Mi) 

M2) 

M0)Mi) 

Mo)M2) 

h(l)h(2) 


(5) 


Therefore,  the  monomial  h( 0)2  is  now  expressed. 


Third  step  :  expression  of  the  third  degree 
monomials 

We  now  care  about  monomials  /t(0)2/i(l)  and 
h(0)2/i(2).  These  monomials  can  be  expressed  us¬ 
ing  the  expression  of  the  monomial  MO)2  ln  equa¬ 
tion  (5).  In  this  equation,  if  we  multiply  h( 0)2  by 
h(l),  monomials  /j(0)2/j(1)  appears  in  the  left  hand 
side  and  monomials  Ml),  h(0)h(l),  Ml)2,  h(2)h(l), 
h(0)h(l)2,  h(0)/i(l)/i(2)  and  h(\)2h{2)  appear  in  the 


right  hand  side.  Among  these  monomials,  one  can  dis¬ 
tinguish  those  that  are  in  the  basis  like  Ml),  M0)M1), 
h(2)h(l)  and  h(0)h(l)h{2),  those  that  are  already  ex¬ 
pressed  in  the  basis  like  Ml)2,  an(l  those  that  are  un¬ 
known  like  h(l)2h(2)  and  h(0)h(l)2.  However,  these 
unknown  monomials  are  of  the  same  kind  as  monomials 
h(0)2h(l)  and  h(0)2/i(2),  and  one  can  show  that  the  ex¬ 
pression  of  monomials  h(0)2h(l),  h(0)2h(2),  h(\)2h(0), 
h(l)2h(2),  h(2)2h(0)  and  h(2)2h(l)  using  (5)  leads  to  : 


'  h(0)2h(l)  ' 
h(0)2h(2) 
h{l)2h{0) 
h(l)2h(2) 
h(2)2h(0) 

_  h{2)2h{l) 


1 

MO) 

Ml) 

M2) 

M0)Mi) 

h(0)h(2) 

h(l)h(2) 

h(0)h(l)h(2) 


(6) 


Fourth  step  :  expression  of  the  fourth  degree 
monomials 

Using  the  same  method  as  before,  one  can  express 
the  monomials  like  h(0)2h(l)/i(2)  using  equation  (6). 
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We  now  have  expressed  all  the  monomials,  and  the 
multiplication  matrix  can  then  be  built. 

3.5.  Choosing  the  channel  estimate 

Once  the  multiplication  matrix  is  computed,  the  pos¬ 
sible  solutions  are  given  by  its  eigenvectors.  Then,  the 
last  step  consists  of  choosing  the  solution  that  best 
matches  the  true  channel. 

The  method  used  in  this  paper  consists  of  compar¬ 
ing  the  circular-statistics  of  the  observation  with  that 
given  by  each  estimate.  The  solution  that  best  matches 
is  selected. 

3.6.  Identifiability 

Lemma  3.2  Suppose  we  look  for  a  FIR  channel  H  of 
length  M  from  given  second-order  circular  statistics, 
then  the  number  of  solutions  is  infinite  because  of  a 
scalar  phase  indeterminacy.  If  the  phase  indeterminacy 
is  fixed,  the  number  of  solutions  is  finite  and  equal  to 
2m_1  if  H  is  causal  and  equal  to  22M~ 2  if  H  is  not 
necessarily  causal. 

Theorem  3.3  Suppose  we  look  for  a  FIR  channel  H 
of  length  M  from  given  second-order  circular  and  non¬ 
circular  statistics,  then  the  number  of  solutions  is  finite 
and  equal  to  : 

•  2  if  H  has  no  real  root, 

•  2C?+1  if  H  has  Q  real  roots  and  is  causal, 

•  22C?+1  if  H  has  Q  real  roots  and  is  not  necessarily 
causal. 

When  the  source  is  MSK,  the  channel  can  be  identified 
up  to  a  sign. 

Proof.  The  2-transform  of  the  circular  covari¬ 
ance  c(n)  of  the  output  y(n)  is  equal  to  C(z)  = 
H{z)H*[\/z*).  This  shows  that  if  H{z)  is  causal,  it 
can  be  determined  up  to  2  indeterminacies.  First, 
H(z )  can  only  be  determined  up  to  a  multiplicative 
constant  phase  factor.  Second,  if  H(z)  is  transformed 
into  H{z)<$>{z)  where  4>(z)  verifies  $(z)4>*(l/z*)  =  1, 
C(z)  remains  the  same.  It  is  well  known  that  $(z)  is 
an  all  pass  filter,  i.e.  $(z)  is  of  the  form  : 


Since  H(z)  must  be  FIR  and  $(z)  is  not  FIR,  H (z)4>(z) 
is  FIR  only  if  each  pole  of  $(z)  is  associated  with  a  root 
of  H(z).  As  a  consequence,  there  is  a  finite  number  of 
all-pass  filters  such  that  H (z)$(z)  is  FIR.  Therefore,  if 


the  phase  indeterminacy  is  fixed,  there  are  2M ~ 1  possi¬ 
ble  FIR  filters  that  correspond  to  C(z).  This  gives  the 
first  part  of  lemma  3.2. 

If  H (z)  is  non-causal,  a  third  indeterminacy  exists. 
C(z)  has  M  —  1  pairs  of  roots  1/6^)  (if  is  a  root 
then  1  /b*i  is  also  a  root).  Thus  any  H(z)  built  with 
M  —  1  roots,  where  each  root  hj  is  equal  to  bj  or  1  jb* , 
gives  th  same  C(z).  The  number  of  these  channels  is 
equal  to  2M-1.  Thus,  when  H(z)  is  non  causal,  the 
number  of  solutions  is  equal  to  22M-2.  This  gives  the 
second  part  of  lemma  3.2. 

Suppose  now  that  we  also  use  the  non-circular  co- 
variance  c(n)  of  y(n).  Its  z-transform  is  equal  to 
C(z)  =  H(z)H(l/z),  if  the  input  is  white.  This  new 
constraint  shows  that  the  phase  indeterminacy  is  re¬ 
duced  to  a  sign  indeterminacy,  and  that  the  all-pass 
filter  <f>(z)  must  have  real  poles.  As  a  consequence,  if 
H(z)  has  no  real  roots,  the  all-pass  filter  <f>(z)  must  be 
equal  to  ±1.  In  this  case,  there  are  only  2  solutions.  If 
H{z)  has  Q  real  roots,  one  can  use  the  results  of  lemma 
3.2.  The  number  of  solutions  is  then  equal  to  2Q+1  if  II 
is  causal,  because  the  solutions  are  given  up  to  a  sign; 
if  H  is  not  necessarily  causal  the  number  of  solutions 
is  equal  to  22®+1. 

When  the  input  source  is  MSK,  C(z)  = 
H{z)H{—  1/z),  which  no  all-pass  filter  but  4>(z)  =  ±1 
satisfies.  Therefore,  the  channel  can  be  identified  up  to 
a  sign  if  the  input  source  is  MSK.  □ 

Corollary  3.4  Suppose  the  circular  and  non-circular 
moments  of  the  output  y(n)  are  known  and  that  the 
channel  H(z)  has  no  real  roots.  Then,  there  exists 
2m_1  C£m~_ 2  possible  solutions,  each  of  them  known 
up  to  a  constant  phase  indeterminacy.  They  can  be 
computed  directly  from  the  covariance  C(z).  For  each 
solution,  the  phase  indeterminacy  can  be  fixed  with  the 
non-circular  moments,  and  the  channel  estimate  is  the 
channel  that  best  matches  the  non-circular  moments. 
This  Corollary  gives  a  new  identification  method. 

4.  COMPUTER  RESULTS 

The  first  tests  have  been  run  on  a  random  FIR  chan¬ 
nel  (M  =  5).  At  each  run  the  channel  is  a  realization 
of  a  Clarke  filter  in  the  typical  urban  mode  and  is  ex¬ 
cited  by  a  MSK  input.  The  performances  are  presented 
as  a  function  of  the  SNR  and  of  the  length  N  of  the 
observation  block,  and  averaged  over  500  runs. 

Figure  1  shows  the  average  Bit  Error  Rate  obtained 
at  the  output  of  a  Viterbi  algorithm  that  uses  our  chan¬ 
nel  estimate.  The  solid,  dashed  and  dashdotted  lines 
correspond  to  block  lengths  N  =  200,  N  =  500  and 
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N  =  1000  respectively.  These  performances  are  com¬ 
pared  to  the  average  BER  obtained  with  the  true  chan¬ 
nel  (dotted  line). 

For  high  SNRs  and  N  =  200  or  N  =  500  the  results 
show  the  effects  of  statistic  estimation  errors.  These 
effects  disappear  for  N  =  1000,  where  the  performances 
exhibit  a  loss  of  2 dB  compared  to  the  true  channel 
results. 


Figure  1:  Average  Bit  Error  Rate  at  the  output  of  a 
Viterbi  algorithm  using  our  channel  estimate. 

A  second  test  has  been  run  on  a  random  FIR  chan¬ 
nel  of  length  M  =  3.  Figure  2  shows  the  average  Bit 
Error  Rate  obtained  at  the  output  of  a  Viterbi  algo¬ 
rithm  that  uses  our  channel  estimate  when  M  =  3 
(solid  line)  and  M  =  5  (dashdotted  line).  These  two 
performances  are  compared  to  the  one  obtained  when 
the  true  channel  is  used  (dotted  line).  Hence,  this  test 
illustrates  the  loss  of  performance  encountered  in  pres¬ 
ence  of  over-determination. 

5.  CONCLUDING  REMARKS 

In  this  paper,  we  presented  a  new  blind  identification 
method  based  on  the  non-circular  moments  of  the  ob¬ 
servation.  These  moments  yield  a  polynomial  system 
that  can  be  solved  by  computing  the  eigenvectors  of 
a  multiplication  matrix.  Identifiabilty  results  are  pre¬ 
sented  when  a  FIR  channel  is  searched  for,  from  both 
circular  and  non-circular  moments.  Finally,  computer 
results  show  that  the  behaviour  of  our  algorithm  de¬ 
pends  on  the  estimation  of  the  moments. 
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ABSTRACT 

We  explore  the  utility  of  second-order  statistics  for  blind 
equalization  of  nonlinear  channels.  Although  any  SOS- 
based  method  can  only  identify  the  channel  to  within  a 
mixing  matrix  (at  best),  sufficient  conditions  are  given 
to  ensure  that  the  ambiguity  is  at  a  level  that  still  al¬ 
lows  equalization.  These  conditions  are  satisfied  by  a 
wider  class  of  inputs  than  those  satisfying  the  condi¬ 
tions  derived  in  previous  works. 

1.  INTRODUCTION 

In  recent  years  blind  equalization  of  single-input  mul¬ 
tiple-output  (SIMO)  linear  channels  has  received  con¬ 
siderable  attention,  motivated  by  the  fact  that  these 
channels  can  be  perfectly  equalized  if  the  subchannels 
are  coprime  and  the  equalizer  is  long  enough,  and  that 
the  equalizer  can  be  obtained  from  the  second-order 
statistics  (SOS)  of  the  received  signal  [6]. 

With  a  few  exceptions  [1,  5,  8],  almost  all  the  avail¬ 
able  literature  on  blind  equalization  is  devoted  to  the 
linear  channel  case.  However,  many  real  world  com¬ 
munication  systems,  such  as  digital  satellite  and  radio 
links,  high-density  magnetic  and  optical  storage  chan¬ 
nels,  etc.,  exhibit  a  considerable  degree  of  nonlinear¬ 
ity.  Hence  it  is  of  interest  to  address  the  issue  of  blind 
equalization  of  nonlinear  channels.  The  SIMO  channel 
model  that  we  consider  here  has  the  following  form: 

Lo  D 

xn  =  'y  ]  hojan-j  +  y  ^  y  ^  hijZn_j  +  r]ni  (1) 
j=0  i=l  j= 0 

where  {a„}  is  the  scalar,  stationary  input,  the  terms 
Zn  =  fi(fln,  a„_i,  •  •  •)  are  known  scalar-valued  nonlin¬ 
ear  causal  functions  of  {a„},  hij  are  K  x  1  coefficient 
vectors,  and  r)n,  xn  are  K  x  1  signal  vectors  represent¬ 
ing  an  additive  disturbance  and  the  observed  signal,  re¬ 
spectively;  the  number  of  subchannels  is  K .  The  noise 


{t7„}  and  the  signal  {an}  are  assumed  to  be  indepen¬ 
dent.  This  model  accommodates,  for  example,  poly¬ 
nomial  approximations  of  nonlinear  channels  (Volterra 
models),  though  the  ‘basis  functions’  {z$}  need  not 
be  monomials  in  principle. 

We  seek  conditions  under  which  zero-forcing  (ZF) 
linear  equalizers  for  the  class  of  channels  (1)  can  be  de¬ 
signed  using  only  the  SOS  of  {a:n}.  SOS-based  meth¬ 
ods  are  often  preferred  since  they  can  perform  their 
tasks  with  relatively  short  data  records.  The  fact  that 
linear  finite  impulse  response  (FIR)  systems  can  per¬ 
form  ZF  equalization  of  nonlinear  SIMO  Volterra  chan¬ 
nels  under  certain  conditions  was  first  pointed  out  in 
[1],  together  with  a  blind,  deterministic  approach  for 
equalizer  design.  Although  the  method  is  simple,  it 
has  been  shown  in  [3]  that  the  conditions  in  [1]  are  in 
fact  conservative.  In  [3]  we  have  given  certain  condi¬ 
tions  on  the  input  statistics  that  suffice  to  determine  a 
linear  equalizer  for  (1).  In  this  paper  we  significantly 
expand  the  results  of  [3]  by  giving  new  conditions  that 
accommodate  a  wider  class  of  inputs. 

Another  SOS-based  approach  is  suggested  in  [5],  in¬ 
spired  in  the  method  of  [9]  for  linear  channels.  How¬ 
ever,  this  method  requires  that  every  nonlinear  sub¬ 
channel  be  linearizable  by  an  FIR  Volterra  system  of 
known  order  and  memory,  which  is  in  general  not  possi¬ 
ble,  especially  if  each  subchannel  is  modeled  as  an  FIR 
Volterra  system  itself  (a  common  practice). 

In  principle  the  model  (1)  could  be  seen  as  a  lin¬ 
ear  multiple-input  multiple-output  (MIMO)  system  by 
treating  the  {z$}iL i  as  additional  inputs.  Although 
SOS- based  techniques  exist  for  equalization  within  such 
a  framework  [6],  these  techniques  usually  assume  that 
the  different  inputs  are  uncorrelated  (which  is  no  longer 
true  in  our  setting),  and  they  only  resolve  the  inputs 
to  within  a  mixing  matrix.  In  the  current  context,  as 
z$  are  functions  of  {o„},  this  would  mean  that  only  a 
memoryless  nonlinear  function  of  the  input  could  be  ob¬ 
tained.  The  results  of  this  work  show  that  under  right 
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conditions  the  structure  of  the  mixing  matrix  permits 
obtaining  linear  ZF  equalizers.  These  conditions  are  on 
the  statistical  properties  of  the  symbols  {an}  and  the 
remaining  basis  functions  {z^  }.  Therefore  they  can  be 
checked  a  priori  in  order  to  determine  whether  a  given 
channel  structure  can  be  equalized  from  SOS. 

2.  PROBLEM  STATEMENT 

We  denote  transpose  and  conjugate  transpose  by  (-)T, 
(• )H  respectively.  By  collecting  N  successive  observa¬ 
tions  into  Xf  —  [  xf,  •  •  •  x^_N+l  ],  one  can  write 

=  TSn  +  Vn,  (2) 


Under  A2,  can  be  estimated  as  the  smallest 
eigenvalue  of  C'x(O).  Thus  the  effect  of  the  noise  can  be 
removed  from  Cx(k).  Henceforth  we  shall  assume  that 
Cx(k)  =  TCB{k)TH .  The  problem  under  consideration 
can  be  posed  as  follows: 

Blind  Equalizability  Problem:  Let  f  be  a  ma¬ 
trix  of  the  same  size  as  T  such  that 

TCs{k)TH  =  TCs{k)TH,  fc  =  0,  l,...fc.  (6) 

We  say  that  T  is  compatible  with  the  second  order 
statistics  of  Xn.  Determine  conditions  under  which 
a  ZF  equalizer  gs  for  any  compatible  T  is  also  a  ZF 
equalizer  for  T.  That  is, 


where  =  [  rff  ■■■  rfc_N+1  ]  is  the  noise  vector, 
the  signal  regressor  Sj  =  [  (5,^1))T’  (Si2))T  ]  with 

[  an  0"n- 1  •  -  *  an-Lo-N+ 1  (3) 

\  z(1)  •••  z{l)  I  ... 

I  zn  1  I 

•••  I  z(D)  •••  z(D)  1T  (4) 

I  Zn  Zn-LD-N+ 1  J  >  \V 

and  the  channel  matrix  T  =  [  T0  T\  •••  TD  ], 
with  every  Ti  block  Toeplitz, 


9??  =  ef  =*  g?F  =  ce?,  (7) 

with  0  <  S  <  ki  —  1  and  c  ^  0. 

This  was  solved  in  [7]  for  the  particular  case  of  linear 
channels  with  white  inputs,  for  which  if  f  is  compatible 
with  k  =  1,  then  T  =  e*6 T  so  that  (7)  holds.  It  is  our 
goal  to  extend  this  result  to  the  class  of  channels  (1) 
and  colored  inputs. 


3.  THE  AMBIGUITY  MATRIX 


hiLi 

hio  •••  hnt 


KNx{N+Li). 


For  convenience,  let  Aq  =  N  +  L0,  which  is  the 
size  of  Sn  \  the  linear  part  of  the  regressor;  and  k%  = 

Li- 1 - \-Lp  +  DN,  which  is  the  size  of  (thus  Sn 

is  (Aq  +  k2)  x  1). 

From  (2),  one  has 


Cx(k)  =  cov(Xn,Xn-k)  =  FCs(k)FH  +  Cv(k),  (5) 

with  Cs(k)  =  cov(5n,5n_fe),  Cv(k)  =  cov(Un,U„_fe) 
the  signal  and  noise  covariance  matrices.  We  adopt 
the  following  standard  assumptions: 

Al:  The  channel  matrix  T  has  full  column  rank. 

A2:  {r)n}  is  zero-mean,  white,  with  covariance  <t^Ik- 
A3:  The  covariance  matrix  (7S(0)  is  positive  definite. 

A  necessary  condition  for  Al  to  hold  is  K  >  D  + 1, 
which  parallels  the  ‘more  outputs  than  inputs’  condi¬ 
tion  in  blind  identification  of  MIMO  channels.  Observe 
that  Al  ensures  the  existence  of  vectors  gs  such  that 
sf  F  =  ef ,  where  ej  is  the  5-th  unit  vector  (counting 
from  zero).  Thus  in  the  noiseless  case,  for  0  <  5  <  fci-l 
one  has  gl!  Xn  =  an_<5  so  that  these  vectors  provide  ZF 
linear  equalizers.  From  these,  minimum  mean-square- 
error  equalizers  can  be  obtained  [4]. 


Observe  that  assumption  A3  allows  us  to  write  Ca( 0)  = 
QQH  where  Q  is  nonsingular  (not  necessarily  unique). 
Introduce  the  normalized  channel  and  signal  covariance 
matrices  respectively  as 

F  =  TQ,  Cs(k)  =  Q~1C,{k)Q~H .  (8) 

Using  (8),  the  covariance  matrices  Cx(k)  become 

Cx(k)  =  FCs{k)FH ,  with  Cs(0)  =  I.  (9) 

Similarly,  if  T  is  compatible,  let  F  =  TQ,  so  that  F 
satisfies 

FCs(k)FH  =  FC,(k)FH,  0  <k<k.  (10) 

For  k  =  0,  (10)  reads  as  FFH  =  FFH .  Since  F  has 
full  column  rank,  this  implies  F  =  FP  for  some  uni¬ 
tary  matrix  P.  Thus  the  corresponding  (unnormalized) 
compatible  channel  matrix  must  satisfy 

F  =  FQ~l  =  T(QPQ~1),  (11) 

which  shows  that  any  compatible  channel  matrix  is  re¬ 
lated  to  the  true  channel  via  a  mixing  matrix  of  the 
form  P  =  QPQ~l.  Observe  that  although  P  is  uni¬ 
tary,  in  general  P  is  not.  Let  us  introduce  the  concept 
of  admissibility. 
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Definition  1  (Admissibility)  A  (ki+k2) -square  ma¬ 
trix  T  is  said  to  be  admissible  if  it  is  of  the  form 


T  = 


A  0 


A  ki  x  ki  diagonal  invertible.  (12) 


with  the  asterisks  indicating  irrelevant  values.  Note 
that  if  T  is  admissible  and  invertible ,  so  is  T~l;  and 
any  function  of  an  admissible  matrix  is  admissible. 

Observe  that  if  T  =  TP  is  compatible  with  P  = 
QPQ~l  admissible,  then  the  condition  (7)  is  satisfied. 
Thus  resolution  of  the  channel  matrix  to  within  this 
ambiguity  suffices  for  equalization  purposes.  We  now 
ask,  when  is  this  resolution  possible? 

To  answer  this  question  we  must  explore  the  con¬ 
straints  that  the  conditions  (10)  impose  on  the  matrix 
P.  Substituting  T  =  FP  into  (10)  and  using  the  fact 
that  F  has  full  rank,  these  constraints  can  be  written 
as 

PCs(k)  =  Cs(k)P,  l<k<k.  (13) 

That  is,  P  must  commute  with  the  normalized  source 
covariance  matrices  <?*(!),  . . . ,  Cs{k). 


4.  A  SIMPLIFIED  EQUALIZABILITY  TEST 

Determining  the  general  form  of  all  unitary  matrices  P 
that  satisfy  (13)  requires  solving  a  linear  set  of  equa¬ 
tions  with  quadratic  constraints.  Fortunately,  this  prob¬ 
lem  can  be  replaced  by  one  of  solving  a  linear  set  of 
equations  with  linear  constraints.  First  recall  that  any 
unitary  matrix  P  can  be  written  a s  P  =  e*w  where  W 
is  a  Hermitian  matrix  with  eigenvalues  in  [0,27r)  [2]. 
Secondly,  we  have  the  following  result  [3]: 

Theorem  1  Let  W  be  (hi  +  fc2)- square  Hermitian  and 
P  =  ejW .  Then  P  and  Cs{k)  commute  if  and  only  if 
W  and  Cs(k)  commute. 

Hence  the  problem  can  be  broken  into  these  three  steps: 

1.  Select  a  square  root  Q  of  Cs(0). 

2.  Find  all  Hermitian  matrices  W  commuting  with 
C.(k)  =  Q-'CsWQ-11  for  1  <  k  <  k. 

3.  Check  whether  for  these  matrices  W,  QWQ -1  is 
admissible.  If  so,  the  channel  can  be  equalized 
using  second-order  statistics. 

The  usefulness  of  theorem  1  is  revealed  in  that  steps 
2  and  3  above  are  much  easier  to  solve  for  Hermitian 
matrices  than  for  unitary  matrices. 

As  noted  above,  the  matrix  Q  such  that  C„(0)  = 
QQH  is  not  unique.  Although  it  is  true  that  an  ade¬ 
quate  choice  of  Q  can  considerably  simplify  the  test  for 


SOS-based  equalizability,  as  discussed  in  section  5,  it 
must  be  pointed  out  that  the  result  of  the  test  is  in¬ 
dependent  of  Q.  This  is  because  all  square  roots  can 
be  parameterized  as  Q  =  QqU,  where  Qo  is  a  par¬ 
ticular  solution  and  U  is  any  unitary  matrix.  Conse¬ 
quently,  a  unitary  P0  commutes  with  QQlCs(k)QQH  if 
and  only  if  P  =  UHP0U,  which  is  unitary,  commutes 
with  Q~1C„(k)Q~H .  In  addition,  one  has 

QPQ'1  =  QoPoQo \ 

so  that  admissibility  of  QPQ -1  is  equivalent  to  that  of 
QoPoQo1-  Thus  equalizability  does  not  depend  on  the 
specific  square  root  Q. 

Our  goal  now  is  to  determine  sufficient  conditions  in 
order  to  ensure  success  of  the  SOS-based  equalizability 
test  a  priori. 


5.  MAIN  RESULTS 


It  will  be  especially  useful  to  consider  square  roots  Q 
which  are  block  lower  triangular  (with  block  partition 
corresponding  to  linear  and  nonlinear  parts  of  Sn,  as  in 
(12)),  for  the  following  reason:  suppose  that  the  Her¬ 
mitian  matrices  W  solving  step  2  of  the  equalizability 
test  are  block  diagonal.  Then  P  =  e]W  are  block  di¬ 
agonal,  and  thus  if  Q  was  block  lower  triangular,  the 
mixing  matrices  P  =  QPQ -1  will  be  block  lower  tri¬ 
angular  as  well.  Having  P  block  lower  triangular  (i.e. 
of  the  form  (12)  but  with  A  not  necessarily  diagonal) 
is  the  first  step  towards  admissibility:  it  significance  is 
that  a  linear  ZF  equalizer  gs  for  the  compatible  ma¬ 
trix  T  (g'/T  =  ef ,  0  <  d  <  ki  -  1),  although  not  a 
ZF  equalizer  for  the  true  channel  T,  still  removes  all 
the  nonlinear  ISI  if  P  is  block  lower  triangular  since 
gfT  =  ef  P-1. 

Before  presenting  block  lower  triangular  options  for 
the  square  root  Q,  we  give  the  following  result  which 
ensures  the  block  diagonal  property  of  the  Hermitian 
matrices  W  that  commute  with  Cs(l). 

Theorem  2  Assume  that  there  exists  a  matrix  Q  such 
that  Cs( 0)  =  QQh  and 


C’s(l)  =  Q-'CsMQ'11  = 


Cu  0 

U21  C22 


(14) 


with  Cij  having  size  ki  x  kj.  Suppose  that  either  (i) 
Cu,  C22  do  not  share  any  eigenvalues;  or  (ii)  C21  =  0, 
and  Cu,  C22  do  not  share  any  elementary  Jordan  block 
in  their  Jordan  decompositions.  Then  any  Hermitian 
W  commuting  with  C's(l)  must  be  of  the  form 


Wu 
0  W22  I  ’ 


(15) 


with  Wu  Hermitian  of  size  ki  x  ki. 
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With  this  result  in  mind,  we  shall  focus  on  block  tri¬ 
angular  square  roots  Q  and  look  for  conditions  under 
which  (14)  is  satisfied.  Let  Aij  =  co v(s£l),  si^)  (which 
has  size  fcj  x  kj)  so  that 


C.( 0)  = 


-dll  ^12 
A12  A22 


channel  does  not  exceed  that  of  the  linear  part.  Then 
with  Q  as  in  (16),  the  corresponding  matrix  Cg(l)  is 
block  diagonal,  i.e.  as  in  (If)  with  C21  =  0. 

In  addition,  suppose  that  the  diagonal  blocks  ofCs(  1) 
do  not  share  any  elementary  Jordan  block  in  their  Jor¬ 
dan  decompositions.  Then  for  all  Hermitian  matrices 
W  commuting  with  Cs(  1),  QWQ~l  is  admissible. 


Define  the  Schur  complement  A0  =  A22  -  A^Aff  Ay2, 
which  is  positive  definite.  The  following  choice  of  Q 
will  prove  particularly  useful: 

o-\  A 0  1  rifn 

aH  a-H/2  a  1/2  >  (16) 

^12^11  -^0 

where  A\^ ,  A^2  are  square  roots  of  An  and  A0  re¬ 
spectively,  i.e.  An  =  A\/2  A^/2 ,  A0  =  A^2A^2.  The 
following  results  give  sufficient  conditions  on  the  source 
statistics  and  the  channel  nonlinearities  in  order  to  have 
P  admissible. 

Theorem  3  Suppose  that  the  symbol  sequence  {an}  is 
an  autoregressive  (AR)  process  of  order  not  exceeding 
fci  with  independent,  identically  distributed  (iid)  inno¬ 
vations,  i.e.  it  is  generated  by  means  of  all-pole  filtering 
of  an  iid  process  {w)n}  as  follows: 

fcl 

an  =  wn  ^  '  y, a,,  -- 2  ■  (17) 

t=i 


We  must  remark  that  having  the  memory  of  the 
nonlinear  part  no  larger  than  that  of  the  linear  part 
is  not  the  same  as  saying  that  L0  >  Lit  i  =  1, . . .  ,D 
in  (1).  (This  is  because  the  terms  zft  need  not  be 
memory  less).  Instead,  this  memory  requirement  can 
be  interpreted  as  having 


with  /(•)  a  memoryless  mapping. 


According  to  theorem  4,  the  coloring  of  the  symbols 
need  not  be  restricted  to  AR  models  if  the  symbols  are 
Gaussian.  Gaussianity  also  result  in  a  less  stringent 
eigenvalue  condition  on  C's(l)  for  admissibility  (since 
its  diagonal  blocks  may  now  share  eigenvalues  as  long 
as  their  respective  Jordan  blocks  have  different  sizes). 
On  the  other  hand,  the  conditions  on  theorem  4  in¬ 
clude  the  memory  limitation  on  the  nonlinear  part  of 
the  channel,  while  no  such  constraint  was  present  in 
theorem  3  for  AR  symbols. 

Let  us  define  Bi:j  -  co v(s£’\  S^),  i,j  =  1,2,  and 


B0  =  B22  —  A^2Axf  BnA^l  Ai2-  (18) 


Then  with  Q  as  in  (16),  the  corresponding  matrix  CB(  1) 
is  block  triangular  as  in  (1\). 

In  addition,  suppose  that  the  diagonal  blocks  ofCB{  1) 
do  not  share  any  eigenvalue.  Then  for  all  Hermitian 
matrices  W  commuting  with  Cs(  1),  QWQ~l  is  admis¬ 
sible. 

This  result  can  be  understood  as  follows.  The  au¬ 
toregressive  condition  on  the  symbols  {an}  provides  the 
desired  block  triangular  structure  (14)  for  C*(l).  If  the 
diagonal  blocks  of  Cs(  1)  do  not  share  any  eigenvalue, 
then  one  can  conclude  from  theorem  2  that  the  Hermi¬ 
tian  matrices  W  must  be  block  diagonal.  Theorem  3 
tells  us  that  in  addition  to  this,  these  W  are  such  that 
P  =  Qe]WQ~l  is  admissible.  Thus  for  AR  symbols,  it 
suffices  for  equalizability  to  check  the  eigenvalue  con¬ 
dition  on  Cs(l).  Observe  that  iid  symbol  sequences 
constitute  a  particular  class  of  AR  processes  for  which 
7i  =  0  in  (17). 

The  next  result  provides  similar  conclusions  under 
different  conditions: 

Theorem  4  Suppose  that  the  symbols  {an}  are  Gaus¬ 
sian,  and  that  the  memory  of  the  nonlinear  part  of  the 


It  can  be  shown  that  under  the  conditions  of  theorem 
4,  the  diagonal  blocks  Cu  and  C22  of  Cs(l)  in  (14) 
are  respectively  similar  to  BnAj"/  and  B0Aq 1.  Thus 
one  must  compare  the  elementary  Jordan  blocks  in  the 
Jordan  decompositions  of  these  matrices.  Observe  that 
Bn  Aff  is  a  companion  matrix  associated  to  the  for¬ 
ward  prediction  error  filter  of  order  ki  for  the  process 
{a„}  (If  the  symbol  sequence  is  white,  then  B^Aff 
reduces  to  the  shift  matrix  with  ones  in  the  first  sub¬ 
diagonal  and  zeros  elsewhere,  as  in  [3]).  Therefore  all 
eigenvalues  lie  inside  the  unit  circle,  and  there  is  only 
one  elementary  Jordan  block  associated  to  each  distinct 
eigenvalue  [2].  On  the  other  hand,  the  Jordan  structure 
of  BoAq1  appears  to  be  more  intricate  due  to  possible 
interactions  between  different  terms  z$. 

Finally,  when  the  symbol  sequence  {an}  and  the 
nonlinear  terms  are  uncorrelated,  the  Jordan  block  con¬ 
dition  is  sufficient  for  equalizability,  regardless  of  the 
color  or  distribution  of  the  symbols: 

Theorem  5  Suppose  that  co v{an,z^_k)  =  0  for  all  k 
and  for  1  <  i  <  D.  Then  with  Q  as  in  (16),  the  matrix 

Cs(l)  is  block  diagonal:  Cs(l)  =  “  ,  with 

0  C22 
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Cu  of  size  hx  ki.  If  Cu  and  C22  do  not  share  any  el¬ 
ementary  Jordan  block  in  their  Jordan  decompositions, 
then  for  any  Hermitian  W  commuting  with  Cs{  1),  the 
matrix  QWQ~l  is  admissible. 

6.  EXAMPLES 

Suppose  that  the  symbols  {an}  are  generated  by  means 
of  an  =  wn  +  ffln-i,  where  {«;„}  is  a  real,  zero-mean, 
iid  process,  symmetrically  distributed  around  the  ori¬ 
gin.  Let  D  =  1  and  z ^  =  zn  =  a2.  Then  one  has 
cov(an,  zn-k)  =  0  for  all  k,  so  that  we  are  in  the  con¬ 
ditions  of  theorem  5.  The  matrices  Cu  and  C22  for 
this  case  are  similar  to  the  companion  matrices  associ¬ 
ated  to  the  forward  prediction  error  filters  of  orders  fa 
and  fa  for  the  processes  {an}  and  {z„},  respectively.  In 
view  of  theorem  5,  if  the  transfer  functions  of  these  pre¬ 
diction  error  filters  are  coprime,  then  SOS-based  equal- 
izability  is  ensured.  Both  {an}  and  {zn}  are  first-order 
Moving  Average  processes  with  autocovariance  coeffi¬ 
cients 

_  COv(ffln,fln-l)  _  1 
Pl  -  cov(an,an)  2’ 

COv(zn,*n-l)  _  1 

' p 2  ~  CO v(zn,zn)  2(1  + a)’ 

where  a  =  E^lwf}/ E[w\).  For  example,  for  Gaussian 
{«;„},  a  =  |  and  p2  =  j;  and  for  equiprobable  wn  = 
±1  (a  BPSK  signal),  one  has  a  =  1  and  p2  =  0,  so  that 
{zn}  is  white.  Moreover,  note  that  since  pi  =  |,  A  =  0 
is  never  an  eigenvalue  of  Cu.  Thus  we  conclude  that 
for  BPSK  {«)„}  this  channel  is  always  SOS-equalizable, 
irrespective  of  the  kernel  memories  Lq  and  L\ . 

Now  suppose  that  {w„}  is  Gaussian  and  that  zn  = 
a3n.  The  processes  {a„}  and  {zn}  are  not  uncorrelated 
any  more;  hence  theorem  5  does  not  apply.  However,  if 
jfa  <  Lq,  the  conditions  of  theorem  4  are  satisfied.  Sup¬ 
pose  Li  =  Lq.  The  matrices  Cu  and  C22  are  now  simi¬ 
lar  to  the  companion  matrices  associated  to  the  forward 
prediction  error  filters  of  order  fa  =  fa  =  N  +  Lq  for 
two  MA(1)  processes  with  autocovariance  coefficients 
pi  =  |  and  p2  =  —  j~.  The  transfer  functions  of  these 
filters  are  always  coprime,  so  that  the  channel  is  SOS- 
equalizable. 

7.  CONCLUSIONS 

The  problem  of  blindly  equalizing  a  SIMO  nonlinear 
FIR  channel  using  second-order  statistics  of  the  ob¬ 
served  signal  has  been  considered.  We  have  presented 
sufficient  conditions  on  the  statistics  of  the  symbol  se¬ 
quence  and  the  channel  structure  (which  can  be  checked 
a  priori)  allowing  the  design  of  linear  FIR  zero-forcing 


equalizers.  These  conditions  considerably  expand  pre¬ 
vious  results  by  accommodating  a  wider  class  of  inputs. 

Our  equalizability  conditions  do  not  describe  how 
the  linear  equalizers  can  be  found  from  the  output  SOS. 
Algorithm  development  exploiting  the  results  presented 
is  the  next  logical  step  and  is  currently  under  study. 
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ABSTRACT 

Super-exponential  algorithm  (SEA),  constant  modulus 
algorithm  (CMA)  and  inverse  filter  criteria  (IFC)  us¬ 
ing  higher-order  statistics  have  been  widely  used  for 
blind  equalization.  Chi,  Feng  and  Chen  have  reported 
that  SEA  and  IFC  are  equivalent  under  certain  condi¬ 
tions.  In  this  paper,  we  further  prove  that  SEA,  IFC 
and  CMA  are  equivalent  under  certain  conditions,  and 
their  convergence  speed  and  computational  load  can 
be  significantly  improved  as  the  given  data  are  prepro¬ 
cessed  by  the  well-known  lattice  linear  prediction  error 
(LPE)  filter  for  both  off-line  processing  and  adaptive 
processing.  Some  simulation  results  are  presented  to 
support  the  analytic  results  and  the  proposed  off-line 
and  adaptive  implementations. 


equalizer  output 


L 

e[n]  =  a:[n]  *  u[n]  =  ^  v[A:]a:[n  -  fc]  (2) 

*=o 


=  u[n]  *  g[n ]  4-  te[n]  *  u[n]  (by  (1)) 
approximates  au[n  —  r]  (a^  0)  where 

g[n]  =  h[n]  *  v[n ]  (3) 


is  the  overall  system  after  equalization.  The  amount  of 
intersymbol  interference  (ISI)  defined  as  [1] 


ISI{ff(n)}  = 


En  lg(n)l2  -  max{|g(n)|2,  Vn} 
max{|<7(n)|2,  Vn} 


(4) 


1.  INTRODUCTION 

Blind  equalization  (deconvolution)  is  a  signal  process¬ 
ing  procedure  to  recover  the  desired  independent  iden¬ 
tically  distributed  (i.i.d.)  non-Guassian  signal,  denoted 
by  u[n],  that  is  transmitted  through  an  unknown  lin¬ 
ear  time-invariant  (LTI)  channel,  denoted  by  h[n],  with 
only  measurements 

x[n]  =  u[n]  *  h[n ]  +  iu[n] 

OO 

=  ^2  h[k]u[n  -  k]  +  w[n]  (1) 

k=— oo 

where  w[n]  is  additive  noise.  The  problem  of  blind 
equalization  arises  comprehensively  in  a  variety  of  ap¬ 
plications  such  as  digital  communications,  seismic  de- 
convolution,  speech  modeling  and  synthesis,  ultrasonic 
nondestructive  evaluation  and  image  restoration. 

The  FIR  linear  equalizer  of  order  L,  denoted  by  u[n], 
has  been  widely  used  to  process  x[n]  such  that  the 

This  work  was  supported  by  the  National  Science  Council 
under  Grant  NSC-89-22 13-E-007-073. 


has  been  used  as  a  performance  index  of  the  designed 
v[n].  The  smaller  ISI  implies  the  better  performance. 

A  number  of  blind  equalization  algorithms  using  higher- 
order  statistics  (cumulants  and  moments)  have  been 
reported  for  designing  v[n]  such  as  the  well-known  con¬ 
stant  modulus  algorithm  (CMA)  [2],  inverse  filter  cri¬ 
teria  (IFC)  [3]  and  super-exponential  algorithm  (SEA) 
[1].  Chi,  Feng  and  Chen  [4]  have  reported  the  equiv¬ 
alence  of  IFC  and  SEA  under  certain  conditions.  In 
this  paper,  we  further  prove  the  equivalence  of  IFC, 
SEA  and  CMA  under  certain  conditions,  thus  sharing 
some  properties  reported  in  [4-7]  under  these  condi¬ 
tions.  Furthermore,  efficient  implementations  of  these 
algorithms  with  preprocessing  by  linear  prediction  er¬ 
ror  (LPE)  filter  are  presented  including  off-line  process¬ 
ing  and  adaptive  processing. 

2.  BACKGROUND 

Let  cumfar,  ...,xp}  denote  the  pth-order  joint  cumu- 
lant  of  random  variables  xi,  ...,  xp,  and  cum{e[n]  : 
p, ...}  =  cum{xi  =  e[n],...,xp  =  e[n],...}.  For  ease  of 
later  use,  let  us  define  the  following  notations 
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V 

=  (u[0],u[1],...,u[L])t 

x[n] 

=  (x[n],x[n  —  l],...,x[n  -  L])T 

fk[n] 

:  Jfcth— order  forward  prediction  error 

bk[n] 

:  fcth-order  backward  prediction  error 

ffcN 

=  (/fc[n],/fc[n-l],...,/fc[n-L])T 

b[n] 

=  (bo[n],b1[n],...,bL[n])T 

rtu 

=  cum{u[n]  :  p,u*[n ]  :  q} 

sgn(a) 

:  sign  of  real-valued  a 

2.1.  Lattice  LPE  Filter 

The  fcth-order  lattice  LPE  filter  with  reflection  coeffi¬ 
cients  pi,  p2,  pk,  simultaneously  provides  the  for¬ 
ward  prediction  error  /*[n ]  and  backward  prediction 
error  bk[n],  that  can  be  expressed  as  follows: 


k 


fk[n] 

=  ^o*[*]*[n-»] 

i= 0 

(5) 

bk[n] 

k 

t=0 

(6) 

where  the  superscript  **’  denotes  complex  conjugation, 
a*[0]  =  1  and  afe[l],  a* [2],  a*[fc],  can  be  obtained 
from  pi,  p2,  ...,  Pk  through  the  computationally  effi¬ 
cient  Levinson-Durbin  recursion.  Two  facts  regarding 
fk[n]  and  6jt[n]  are  as  follows  [8]: 

(FI)  The  fcth-order  LPE  filter  a*[«]  is  a  whitening  filter 
as  k  is  sufficiently  large,  i.e., 

Rf*  =  E  N]  =  (7) 

for  sufficiently  large  k  where  I  is  the  (L  +  1)  x 
(L  +  1)  identity  matrix. 

(F2)  x[n]  and  b[n]  are  causally  invertible  and 

Rb  =  E  [b[n]bH[n]]  =  diag(P0, Pi ,  •••> Pl)  (8) 

2.2.  CMA 

The  CMA  [2]  finds  the  optimal  equalizer  u[n]  by  mini¬ 
mizing  the  following  cost  function 

Jcm(v)  =  E  [(7  -  |e[n]|2)2]  (9) 

where  7  =  £[|u[n]|4]/£[|u[n]|2].  However,  one  has  to 
resort  to  iterative  optimization  algorithms  for  searching 
the  optimum  v. 

2.3.  SEA 

Shalvi  and  Weinstein’s  SEA (p,q)  [1]  is  an  iterative  al¬ 
gorithm  that  updates  v  by  the  following  equations  at 
each  iteration: 

v  =  R^d/||Rx1d||  (10) 


where  Rx  =  E[x[n]x^[n]]  and 

d  =  cum{e[n]  :  p,e*[n]  :  q  -  l,x*[n]},  p+  q  >  3 

(11) 

The  SEA  is  a  computationally  efficient  algorithm  with 
fast  convergence  speed  (in  terms  of  ISI)  but  no  guar¬ 
antee  of  convergence  for  finite  SNR  and  data. 


2.4.  IFC 

The  IFC(p,  q)  [3]  find  the  optimum  v  by  maximizing 
the  following  criteria: 


Jp,g(V)  =  P  +  Q~ 3  (12) 


which  is  a  highly  nonlinear  function  of  v[n\  without 
a  closed-form  solution  for  the  optimum  v.  Chi,  Feng 
and  Chen  [4]  proposed  a  fast  gradient  type  iterative 
algorithm  as  follows: 


Algorithm  1 : 

At  the  zth  iteration,  v$  is  obtained  through  the  fol¬ 
lowing  two  steps. 

(Tl)  Update  v  using  (10)  with  e[n]  =  etl~^[n]  used 
in  d  (see  (11)),  and  obtain  the  associated  eW[n]. 

(T2)  If  JPi,(v)  >  Jp,o(vli-11),  update  vM  =  v,  other¬ 
wise  update  vW  by 

vt*l  —  yh-1)  4.  n  sgn(C'p>9)v  (13) 

such  that  Jm(vM)  >  Jp,?(v[i-1]),  and  obtain 
the  associated  eM[n]. 

Algorithm  1  requiring  real  x[n],  or  complex  x[n)  and 
p  =  q,  shares  the  computational  efficiency  and  conver¬ 
gence  speed  of  the  SEA  with  guaranteed  convergence. 


3.  EQUIVALENCE  OF  SEA(2,2),  IFC(2,2) 
AND  CMA 

Chi,  Feng  and  Chen  [4]  have  proven  the  following  fact: 

(F3)  SEA(p,  q)  and  IFC (p,  q)  are  equivalent  as  x[n]  is 
real  and  p  -I-  q  >  3  or  as  x[n)  is  complex  and 
p  =  q>2. 

As  mentioned  in  (F2),  x[n]  and  b[n]  are  causally  invert¬ 
ible.  Therefore,  deconvolution  with  x[n]  is  equivalent 
to  deconvolution  with  b[n].  Let 

e[n]  =  vTb[n]  (14) 

Replacing  x[n]  and  Rx  in  (10)  with  b[n]  and  Rb,  re¬ 
spectively,  and  replacing  e[n]  in  (11)  with  the  one  given 
by  (14)  for  p  =  q  =  2  through  some  simplification  yields 

v  =  R^£[|e[n]|2e[n]b>]]  (15) 
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except  for  a  scale  factor.  On  the  other  hand,  substitut¬ 
ing  (14)  into  Jcm(v)  given  by  (9),  one  can  easily  show 
that  the  optimum  v  associated  with  the  Jcm(v)  is  the 
same  as  the  one  given  by  (15)  except  for  a  scale  factor. 
Therefore,  we  have  shown  the  following  theorem: 

Theorem  1.  Both  SEA(p,  q)  with  p  =  q  =  2  and  CM  A 
are  equivalent. 

By  (F3)  and  Theorem  1,  we  have  the  following  fact: 

(F4)  The  CMA,  IFC(p,g)  and  SEA(p,  q)  are  equivalent 
as  p  =  q  =  2.  Therefore,  they  share  some  proper¬ 
ties  reported  in  [4-7],  such  as  perfect  equalization 
property  and  relation  to  nonblind  minimum  mean 
square  error  (MMSE)  equalizer. 

4.  LATTICE  IMPLEMENTATIONS 

Let  us  present  lattice  implementations  for  SEA  (p,q), 
IFC(p,  q)  and  CMA  only  for  the  case  of  p  =  q  =  2 
below. 

4.1.  Off-Line  Processing 

Feng  and  Chi  have  reported  two  off-line  lattice  SEA 
(LSEA)  [9]  using  b*[n]  and  f*[n],  respectively.  Next, 
let  us  present  two  lattice  implementations  for  IFC  that 
are  modifications  of  Algorithm  1  with  x[n]  replaced  by 
bjt[n]  and  f*[n],  respectively. 

LIFC-B  Algorithm:  At  the  ith  iteration,  vM  is  ob¬ 
tained  through  the  following  two  steps. 

(51)  Compute  v  by  (15)  where  e[n]  =  ef‘_1l[n]  is  ob¬ 
tained  by  (14)  at  the  (i  -  l)th  iteration. 

(52)  If  J2i2(v)  >  J2,2(vt‘_1l),  update  vW  =  v,  oth¬ 
erwise  update  v[i]  through  a  gradient-type  opti¬ 
mization  procedure  with  the  gradient 

VJ2, 2  oc  sgn(C22)Rb(v  -  v(i-1l).  (16) 

LIFC-F  Algorithm :  Let 

e[n]  =  vTffc[n]  (17) 

where  k  is  sufficiently  large  such  that  (FI)  applies  to 
ffc[n].  At  the  ith  iteration,  vW  is  obtained  through 
the  same  procedure  as  the  previous  LIFC-B  algorithm 
except  that  b[n]  and  Rb  are  replaced  by  f*[n]  and  Rffc , 
respectively,  with  [n]  obtained  by  (17)  and  VJ2, 2 
obtained  by 

VJ2,2  a  sgn(C2ui2)(v  -  v^"1!).  (18) 

A  worthy  remark  regarding  the  proposed  LIFC-B  and 
LIFC-F  algorithms  is  as  follows: 


(Rl)  The  proposed  LIFC-B  and  LIFC-F  algorithms  are 
computationally  efficient  (without  need  of  matrix 
inversion)  with  guaranteed  convergence,  whereas 
the  latter  converges  faster  than  the  former  since 
fk[n]  approximates  an  amplitude  equalized  signal 
by  (FI). 

(R2)  As  deriving  the  LIFC-B  and  LIFC-F  algorithms 
(maximizing  J2)2),  one  can  readily  obtain  two  lat¬ 
tice  CMA  algorithms  (minimizing  Jqm),  using 
bfc[n]  and  f*[n],  respectively,  that  also  share  the 
implementation  merits  of  the  LIFC-B  and  LIFC- 
F  algorithms  mentioned  in  (Rl). 

4.2.  Adaptive  Processing 

Let  v„  denote  the  estimate  of  v  as  x[n]  is  processed. 
An  adaptive  SEA  reported  in  [1]  is  as  follows: 

vn+i  =  vn  +  ^Qn+1x*[n  +  l]e[n](7  -  |e[n]|2)  (19) 

v„+i  =  vn+x/||vn+1||  (20) 

where  p  is  the  step  size  parameter,  and 

e[n  +  l]  =VT+1x[n  +  l]  (21) 

Qn+i  =  (1  “  m)Q«  1  +  +  l]xff[n  +  1]  (22) 

With  Qn+1  and  x[n]  in  (19)  replaced  by  R^1  and  f*[n], 
one  can  obtain 

vn+i  =  vn+p{7-|e[n]|2}e[n]f*[n-|-l](23) 

e[n  +  l]  =  v£+1ffc[n+l]  (24) 

Lattice  SE-JF-CM  Algorithm:  For  each  x[n  +  1], 
two  signal  processing  steps  are  performed  as  follows: 

(Ul)  Obtain  fk[n  +  1]  by  processing  x[n  +  1]  with  the 
adaptive  least  squares  lattice  (LSL)  LPE  filter  [8]. 

(U2)  Update  vn+1  and  e[n  +  1]  using  (23)  and  (24), 
respectively. 

Two  worthy  remarks  regarding  the  lattice  SE-IF-CM 
algorithm  are  as  follows: 

(R3)  The  lattice  SE-IF-CM  algorithm  is  exactly  a  lat¬ 
tice  CMA  algorithm  since  (U2)  is  the  same  as 
the  adaptive  CMA  [2]  and  an  adaptive  IFC  algo¬ 
rithm  [3]  except  that  f*[n]  is  replaced  with  x[n]. 

(R4)  The  proposed  lattice  SE-IF-CM  algorithm  with 
low  computational  load  (without  matrix  multi¬ 
plication  operations)  converges  faster  than  the 
adaptive  SEA  given  by  (19)  through  (22)  and  the 
adaptive  CMA  since  the  adaptive  LSL  algorithm 
in  (Ul)  performs  as  a  fast  amplitude  equalizer. 
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5.  SIMULATION  RESULTS 

Two  examples  are  presented  to  support  our  analytic 
results  and  the  lattice  structure  based  algorithms. 

Example  1:  Off-line  Processing 

The  source  signal  u[n]  was  assumed  to  be  a  4-QAM 
signal  with  unity  variance  and  a  real  channel  /i[n]  was 
taken  from  [1]  as  plotted  in  Figure  1(a).  The  equal¬ 
izer  v[n]  was  assumed  to  be  a  causal  FIR  filter  of  or¬ 
der  L  =  50.  Thirty  independent  runs  for  data  length 
N  =  4096  and  SNR  =  20  dB  (complex  white  Gaussian 
noise)  were  performed  using  CMA  and  SEA(2,2)  with 
the  initial  condition  v\n]  =  tf[n— L/2],  respectively.  The 
averages  of  thirty  independent  estimates  of  equalizer 
v[n]  obtained  using  CMA  and  SEA(2,2)  are  displayed 
in  Figures  1(b)  and  1(c),  respectively,  where  only  equal¬ 
izer  real  parts  are  shown  since  imaginary  parts  are  al¬ 
most  zero.  These  results  justify  Theorem  1. 

Moreover,  Algorithm  1,  LIFC-B  and  LIFC-F  algorithms 
and  a  gradient-based  IFC  algorithm  were  also  employed 
to  process  the  same  simulation  data.  Figure  2  shows 
the  average  of  the  thirty  ./2,2’s  with  respect  to  itera¬ 
tion  number  associated  with  LIFC-F  (fc=50)  algorithm 
(dash  line),  LIFC-B  algorithm  (dash-dotted  line),  Al¬ 
gorithm  1  (dotted  line)  and  the  gradient-based  IFC  al¬ 
gorithm  (solid  line).  Figure  2  depicts  that  the  LIFC- 
F  algorithm  and  Algorithm  1  converge  faster  than  the 
other  two  algorithms  (see  (Rl))  and  the  gradient-based 
IFC  algorithm  converges  slower  than  all  the  other  algo¬ 
rithms.  These  simulation  results  support  the  efficacy 
of  the  proposed  LIFC-B  and  LIFC-F  algorithms. 

Example  2:  Adaptive  Processing 

The  source  signal  u[n]  was  assumed  to  be  a  2-PAM 
(+1,  —1)  signal.  The  same  channel  h[n]  as  shown  in 
Figure  1(a)  was  used,  and  SNR  =  20  dB  (real  white 
Gaussian  noise).  Figure  3  shows  some  simulation  re¬ 
sults  (average  of  thirty  independent  ISI’s  versus  itera¬ 
tion  number)  for  L  =  24  using  the  adaptive  SEA  with 
p  =  q  =  2  and  p  =  0.0026,  the  adaptive  CMA  with 
p  =  0.00215  and  the  proposed  adaptive  lattice  SE-IF- 
CM  algorithm  with  k  =  24  and  p  =  0.002.  Note  that 
the  value  of  the  step  size  p  used  by  each  adaptive  al¬ 
gorithm  was  chosen  through  some  trial-and-errors  such 
that  its  performance  is  “best”  in  terms  of  convergence 
speed  and  ISI.  One  can  see,  from  Figure  3,  that  the 
proposed  adaptive  lattice  SE-IF-CM  algorithm  (solid 
line)  converges  faster  than  the  other  two  adaptive  al¬ 
gorithms  with  ISI  slightly  smaller  than  those  associated 
with  the  other  two  adaptive  algorithms.  These  simula¬ 
tion  results  justify  the  efficacy  of  the  proposed  adaptive 
lattice  SE-IF-CM  algorithm  (see  (R4)). 


6.  CONCLUSIONS 

We  have  shown  the  equivalence  of  the  CMA,  SEA (p,  q) 
and  IFC(p,  q)  for  p  =  q  =  2  as  presented  in  Theorem 
1  and  (F4),  and  therefore,  any  performance  analyses 
for  one  of  them  apply  to  the  others.  Furthermore,  two 
computationally  efficient  off-line  processing  algorithms, 
LIFC-F  and  LIFC-B  algorithms  for  p  =  q  =  2  were 
presented,  while  the  former  is  preferable  to  both  the 
latter  and  Chi,  Feng  and  Chen’s  Algorithm  1  due  to 
faster  convergence  (see  (Rl)).  For  adaptive  process¬ 
ing,  a  computationally  efficient  lattice  SE-IF-CM  algo¬ 
rithm  for  p  =  q  =  2  was  presented  that  has  compu¬ 
tational  complexity  similar  to  the  adaptive  CMA  and 
converges  faster  than  both  the  adaptive  SEA  and  the 
adaptive  CMA  with  similar  resultant  ISI  (see  (R3)  and 
(R4)).  The  efficacy  of  the  proposed  adaptive  lattice 
SE-IF-CM  algorithm  and  the  proposed  analytic  results 
were  supported  by  some  simulation  results.  As  a  final 
remark,  for  p  q  or  p  =  q  ^  2,  lattice  implementations 
of  the  SEA  and  IFC  can  be  similarly  developed. 
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IMPULSE  RESPONSE  OF  THE  CHANNEL 


(a) 


REAL  PART  OF  THE  EQUALIZER 


(b) 


REAL  PART  OF  THE  EQUALIZER 


(c) 

Figure  1.  Simulation  results  for  N  =  4096  and  SNR=  20 
dB.  (a)  The  channel  impulse  response;  (b)  average  of  thirty 


Figure  2.  Average  of  thirty  ,/2,2’s  associated  with  LIFC- 
F  (&=50)  (dash  line)  algorithm,  LIFC-B  algorithm  (dash- 
dotted  line),  Algorithm  1  (dotted  line)  and  the  gradient- 
based  IFC  algorithm  (solid  line). 


Figure  3.  Simulation  results  (ISI  versus  iteration  number) 
using  the  adaptive  CMA  (dash  line)  with  p  =  0.00215, 
the  adaptive  SEA  (dotted  line)  with  p  —  q  =  2  and  p  = 
0.0026  and  the  proposed  adaptive  lattice  SE-IF-CM  algo¬ 
rithm  (solid  line)  with  k  =  24  and  p  =  0.002,  respectively. 
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ABSTRACT 

Parameter  estimation  of  each  atom  in  the  adaptive 
Gaussian  basis  representation  (AGR)  is  a  key  problem 
determining  signal  decomposition  results,  which  requires 
an  effective  parameter  estimation  algorithm.  In  this 
paper,  an  efficient  algorithm  is  proposed  to  estimate  the 
time-frequency  atoms  in  AGR.  Compared  with  the 
algorithm  presented  in  AGR,  the  parameter  estimation  is 
more  adaptive,  and  the  estimation  accuracy  is  greatly 
improved  through  an  iterative  method.  Numerical 
simulations  confirm  the  results. 

1.  INTRODUCTION 

The  adaptive  Gaussian  basis  representation  (AGR)[1], 
presented  by  Qian  and  Chen,  is  to  decompose  the 
analyzed  signal  into  a  linear  expansion  of  Gaussian 
atoms,  in  which  the  parameters  of  the  Gaussian  atoms 
can  be  adjusted  to  best  match  the  signal,  and  then  a 
crossterms  free  time-frequency  distribution  is  obtained. 
A  similar  decomposition  result  is  also  obtained  by 
S.Mallat[2]  with  waveforms.  Although  the 
decomposition  error  is  monotonically  decreasing,  the 
results  still  rely  heavily  on  the  parameter  estimation 
algorithm.  In  this  paper,  a  new  efficient  algorithm  to 
compute  the  Gaussian  atoms  in  AGR  is  proposed,  which 
avoids  determining  the  searching  intervals  of  Gaussians’ 
time  centers  and  width  in  advance,  and  greatly  improves 
the  estimation  accuracy . 

2.  ALGORITHM  DESCRIPTIONS 

2.1  Brief  introduction  of  AGR 

The  goal  of  AGR  is  to  represent  a  signal  of 


interest  s(t )  by  a  class  of  localized  time-frequency 
atoms  hk(t ) : 

M 

s(o=£ca<o  (d 

*=  0 

Where, 

*.«)=(— ),,-expfa»i(f~<t)V‘t(M‘)  (2) 

7T  2 

hk  (t)  is  a  normalized  Gaussian,  OCk  determines  the 

width  of  the  Gaussian  function,  and  (tk  ,(Ok  )  is  its  time- 

frequency  center  .  Estimation  of  hk  (t)  at  the  Ath  stage 
is  equivalent  to: 

I  Ck  |2=  maxaj  ,i  <0(  |<  sk  (t),  hk  (t)  >|2  (3) 

and  sk  (t)  is  the  remainder  of  the  orthogonal  projection 
of  Sk_{  (t)  onto  hk_{  ( t ) . 

2.2  Adaptive  algorithm  for  Gaussian  parameters 
estimation 

Assume  it  is  at  the  Ath  stage  in  AGR,  and  the 
remainder  is  sk  (/)  .  The  procedure  of  the  new  proposed 

algorithm  to  estimate  CCk,tk, (Ok  and  Ck  is  presented 
as  follows.  Let  n  =  0 . 
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Step  one:  Estimate  COk  and  initial  value  tkn(n=0  )  of 
tk  with  Sk  (t)  by  use  of  Spectrogram: 

= argma^J^.  (x)g*(t-t)eJmTd^} 

(4) 

Spectrogram  is  used  here  for  the  following  reasons: 

1)  Although  Spectrogram  is  low  in  time-frequency 
resolution,  it  can  avoid  crossterms,  which  is  helpful  in 
extracting  initial  values.  2)  Spectrogram  of  the  input 
signals  displays  parallel  lines  on  the  time-frequency 
plane,  thus  peak  searching  on  this  plane  can  decide  the 
time-frequency  center  of  each  component,  while  it  is  not 
the  case  for  frequency  modulated  signals.  3)  if  the 

weighting  window  ^(/)  is  long  enough,  estimation  of 

1 0k  will  be  satisfactorily  accurate  for  subsequent  use. 

Hence  in  the  following  procedure,  we  assume  that  (Ok 
is  equal  to  the  real  value. 

Step  two:  Estimate  initial  value  ( n  =  0 )  of  ak. 

Since  sk  ( t )  and  hk  ( t )  are  Gaussian-shaped,  and 

FFT  of  both  are  symmetrical  about  CO  =  (Ok,  the 
method  adaptively  estimating  the  width  of  the  time- 
frequency  kemel[3]  is  used  here  for  choosing  (X^ . 
That  is: 

akn  =  max}*  , (i  =  1,2,.../)|  Rki(a)  <  77}  (5) 

Where 

*.,<«)= <6) 
&u(0=exPf-Q*>'  ^  —  }e,*tl'~l->  (7) 


Gk  i  ( (O )  and  Sk  ( CO )  are  normalized  Fourier 
Transform  (FFT)  of  gkj  ( t )  and  sk  (t)  ,  respectively. 
ak  i  (i  =  1,2,.../ )  is  /  different  values  roughly  chosen 
for  OC fa  design,  and  T]  is  a  threshold  to  control  the  range 

of  akn- 

Step  three:  Accurately  estimate  OCk  from  akn  ( n  -  0  ). 
Let  sk  (/)  =  exp  { — °-  —  o)  }eJ<0o(,~'o) 

and  hk  (t)  =  exp  {- ^nJLlhsl  }gM <>-<>„ ) 

respectively,  and  Sk  ((O)  and  Hk  ( CO )  are 
normalized  FFT  of  Sk(t)  and  hk(t) .  Then  we  have: 

rt=||S,(®)|2rf®  =  J|-a„  (8) 

Ptn=l\s,(c>)Hk\a'\da>=  (9) 

Vao  +  (Xfa 

Thus  the  following  result  can  be  obtained: 

Pkn>po  when  akH>a0 

ptn  <  po  when  a  fa  <  a0 

Therefore,  starting  from  the  initial  value  OChi ,  Pkn  ,  and 
take  P0  as  a  reference  value,  Otkn  can  be  adjusted 
adaptively  according  to  the  difference  between  Pkn  and 
Po  until  C( klt  =a0: 

ata+,  =ah,+  u(po  ~pkn)  (n  =  0,1, 2, 3, . ) 
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Where  u  >  0  is  the  convergence  factor.  In  practical 
applications,  however,  it  is  very  difficult  to  choose  u , 
since  Eq.(8),(9)  do  not  hold  exactly,  which  is  resulted 
from  the  limited  length  and  discrete  sampling  of  the 
analyzed  signal.  Consequently,  we  adopt  the  following 
numerical  algorithm: 

1.  Because  the  initial  value  CCkn(n  =  0)  is  always 

much  larger  than  the  real  value,  namely,  n  o  >  p0 
is  always  true,  we  start  from  n  =  1 ,  let 

«a-„  =  untn  Pkn  <Po-  Assume 

n  =  q  at  this  time. 

2.  Adjust  CC klt  with  the  rule  below  until 
\Pb,~Po\<e’and  n=q+\q+ 2,...: 
cckn  =mid{akn_vmm[aki,i  =  l.n-2\aki  >ahiA] } 

Pkn  <  ^0 

akn  -mid{aknA , max[akl,i  =  \,.n-2\ocki  <ak^\ } 

Pkn  >  P0 

Where  £  is  a  given  threshold  controlling  estimation 
precision  of  CCk ,  and  midfx,  y)  means  the  mid  value 
between  X  and  y .  Thus  we  can  get  a  more  accurate 
estimation  of  CCk . 

Step  four:  Accurately  estimate  tk,Ck  with  the 
estimated  (Xk  and  (Qk  according  to  Eq.(3): 

|Q.  |2=max  J< sk(t),hk(t,ak,o)k)>\2  (li) 

3.  IMPLEMENTATION 
CONSIDERATIONS 

The  following  are  observations  needed  to  be 


considered  in  practical  implementations: 

1 .  When  multicomponents  exist  in  the  analyzed 
signals,  long  integral  range  used  in  Eq.  (8)  and  (9) 
may  destroy  the  relationship  in  Eq.(ll).  Hence,  the 
3dB  bandwidth  around  the  spectrum  peak  of  each 
component  is  adopted  as  the  integral  range  of  Eq. 
(8),  (9). 

2.  When  multicomponents  exist  in  the  signal,  great 

errors  will  occur  when  estimating  Ckandtk  with 

Eq.(ll).  An  iterative  algorithm  similar  to  the 
“RELAX”  method  [4]  can  be  used  to  improve  the 
accuracy.  That  is,  each  time  after  a  new  component 
is  estimated,  all  previously  computed  components 
are  re-estimated  again. 

4.  SIMULATIONS 

To  demonstrate  the  effectiveness  of  the  algorithm 
presented  above,  we  apply  it  to  a  synthetic  signal 
composed  of  four  Gaussian  components.  The  signal  is 
256-point,  £  =  0.00 1 ,  and  the  time  center  of  each 
component  is  represented  by  the  sampling  index. 
Comparison  of  the  estimated  parameters  and  the  true 
values  is  given  in  Table  1,  which  shows  high  accuracy  of 
the  developed  algorithm.  The  signal  energy  of  the 
residual  is  less  than  0.03  of  that  contained  in  the  original 
signal. 

Fig.l(a),(b)  show  the  contour  plot  of  Spectrogram 
and  Wigner-Ville  distribution(WVD)  of  the  synthetic 
signal  above,  respectively.  Fig.  1(c)  is  the  WVD  of  the 
signal  after  decomposition.  It  can  be  seen  comparing  (b) 
and  (c)  that,  by  use  of  the  signal  decomposition  method, 
crossterms  in  WVD  can  be  eliminated  effectively,  and 
the  signal ‘s  time-frequency  resolution  is  high. 

5.  CONCLUSION 

Atoms  estimation  in  signal  decomposition  method 
determines  the  decomposition  effects  and  the  description 
of  signals.  The  authors  present  an  efficient  numerical 
algorithm  to  improve  the  parameter  estimation  accuracy 
in  AGR.  The  advantage  is  :  The  procedure  is  simple;  The 
computation  is  not  complex;  Estimation  of  all 
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parameters  of  the  Gaussian  atoms  is  more  adaptive;  The 
estimation  precision  of  OCk  and  tk  is  improved  greatly 
via  an  iterative  adaptive  method. 
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Table  1:  Comparison  between  the  estimated  and  true  values 


Ck 

h 

fk 

ak 

1 

T 

2 

100 

2.5 

0.056 

E 

2.0073 

100 

2.5 

0.0562 

2 

T 

2.0 

66 

0.77 

0.07 

E 

2.0027 

66 

0.7682 

0.0699 

3 

T 

1.0 

80 

1.6 

0.033 

E 

1.0093 

80 

1.6016 

0.033 

4 

T 

1.0 

162 

5.0 

0.1 

E 

1.0046 

162 

5.0 

0.1003 

T:  Denotes  the  true  value. 


E:  Denotes  the  estimated  values. 


(a)  1  (b)  1  (c) 


Fig.l  Contour  plot  of  (a)  Spectrogram  (b)  WVD  (c)  WVD  of  decomposed  signal 
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ABSTRACT 

This  paper  addresses  the  problem  of  estimating  the  pa¬ 
rameters  of  a  deterministic  signal  in  non  stationary,  white, 
Gaussian  noise.  It  is  proposed  to  model  the  time  varying 
white  Gaussian  noise  with  an  unknown  deterministic  vari¬ 
ance  sequence  that  changes  every  sample.  While  making 
relatively  little  assumptions  on  the  non  stationarity  of  the 
noise,  this  type  of  modeling  gives  rise  to  different  difficul¬ 
ties.  In  the  paper  we  identify  the  resulting  difficulties  and 
discuss  possible  solutions. 

1.  INTRODUCTION 

The  problem  of  estimating  the  parameters  of  a  determinis¬ 
tic  signal  in  additive  noise  is  a  problem  of  many  disciplines. 
Numerous  amount  of  applications  exist  for  this  model  and 
range  from  underwater  acoustics  to  cellular  communications. 
In  this  paper  we  take  interest  in  the  case  where  the  ad¬ 
ditive  noise  is  white,  non  stationary  and  Gaussian.  First, 
a  novel  approach  for  modeling  the  non  stationarity  is  ex¬ 
amined.  Then,  for  this  modeling,  specific  estimation  algo¬ 
rithms  are  proposed  and  analyzed. 

The  observed  time  series  at  instant  n,  yn,  is  modeled  as 
follows: 

Un  =  8n(fl)  "b  Vn-  0) 

where  sn(0)  is  a  deterministic  signal  which  is  known  to 
within  a  parameter  vector  0,  and  vn  is  a  white,  non  station¬ 
ary,  Gaussian  noise  sequence. 

The  model  in  (1)  represents  many  important  applica¬ 
tions  for  which  there  exists  abundant  literature.  For  each 
application,  various  signals  are  used  and  different  assump¬ 
tions  are  made  on  the  noise  distribution.  The  present  paper 
relaxes  the  traditional  assumption  on  the  stationarity  of  the 
noise. 

Most  of  the  works  in  the  literature  dealing  with  non  sta¬ 
tionary  processes  are  based  on  a  specific,  or  a  parametric 
model  for  the  non  stationarity.  The  proposed  model  is  then 
justified  for  the  problem  of  interest.  In  [6],  for  instance,  a 
parametric  approach  is  proposed  to  characterize  a  non  sta¬ 
tionary  process  using  a  time  dependent  ARM  A  process. 


However,  since  miss-modeling  may  cause  significant  errors, 
it  would  sometimes  be  best  to  avoid  making  any  assump¬ 
tions  on  the  characteristics  of  the  non  stationarity,  and  to 
model  the  non  stationary  process  in  a  more  general  way.  A 
natural  way  is  to  model  the  noise  variance  sequence  in  ( 1 )  as 
a  sequence  of  positive,  deterministic,  unknown  parameters. 
However,  this  type  of  modeling  causes  various  difficulties. 

The  main  difficulty  in  such  modeling  arises  since  the 
number  of  unknown  parameters  increases  with  the  number 
of  samples.  The  family  of  problems  which  are  character¬ 
ized  by  this  property  was  first  presented  by  Neyman  and 
Scott  [7].  Their  goal  was  to  find  ’’Consistent  estimates  based 
on  partially  consistent  observations”.  In  their  classic  paper 
they  show  that  sometimes  maximum  likelihood  (ML)  esti¬ 
mation  fails  to  provide  consistent  estimates.  They  also  give 
examples  in  which  ML  estimates  are  consistent  but  not  op¬ 
timal  in  the  mean  square  error  sense.  We  follow  their  defi¬ 
nitions  to  present  the  problem  of  interest. 

In  model  (1)  the  parameters  of  interest  are  the  signal  pa¬ 
rameters  while  the  unknown,  time  varying  noise  variances 
are  the  nuisance  parameters.  The  signal  parameters,  0,  are 
called  structural  parameters.  As  the  number  of  observed 
data  samples  approaches  infinity,  these  parameters  affect  the 
probability  law  of  an  infinite  amount  of  data  samples.  The 
noise  parameters,  aj , . . . ,  cr%  are  called  incidental  parame¬ 
ters.  These  parameters  affect  the  probability  law  of  a  finite 
amount  of  data  samples  even  when  the  number  of  observed 
data  samples  approaches  infinity.  For  the  model  presented 
in  (1)  it  is  shown  in  [3]  that  ML  estimates  for  0  do  not  exist 
in  general  and  therefore,  other  solutions  should  be  looked 
for. 

2.  OPTIMALITY  CRITERIA 

In  estimation  problems  in  which  there  exist  only  structural 
parameters,  under  mild  regularity  conditions,  ML  estima¬ 
tor  is  the  asymptotic,  universal,  optimal  estimator  in  the 
mean  square  error  sense,  i.e.,  regardless  of  the  values  of 
the  parameters  to  be  estimated,  ML  estimation  achieves 
asymptotically  the  best  performance  possible  described  by 
the  Cramer  Rao  bound  (CRB).  However,  in  cases  where 
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the  estimation  problem  includes  incidental  parameters,  the 
CRB  is  not  always  attainable  and  therefore  may  serve  only 
as  a  lower  bound.  Thus,  an  estimator  may  be  optimal  with¬ 
out  attaining  the  CRB. 

In  the  search  for  the  optimal  estimator  (in  the  mean  square 
sense),  few  options  rise.  One  would  very  much  like  to  find 
a  universal  optimal  estimator  in  the  presented  family  of  es¬ 
timation  problems.  In  other  words,  to  find  an  estimator  for 
the  structural  parameters,  which  is  optimal  for  any  given  se¬ 
quence  of  incidental  parameters.  However,  this  is  in  general 
impossible.  It  is  possible,  however,  to  find  an  optimal  esti¬ 
mator  in  a  restricted  family  of  estimators.  Such  a  criterion 
of  optimality  is  discussed  in  [1],  A  different  optimality  cri¬ 
terion  is  to  obtain  the  optimal  estimator  when  the  unknown 
incidental  parameters  is  regarded  as  an  i.i.d.  sample  from 
an  unknown,  but  fixed  distribution  belonging  to  a  non  para¬ 
metric  family  of  distributions.  Such  an  estimating  problem 
is  said  to  be  semi-parametric  and  an  optimal  estimator  may 
be  derived  for  different  problems  using  adaptive  estimation 
technique  (see  [2]).  Finally,  another  possible  optimality  cri¬ 
terion  is  to  obtain  the  optimal  estimator  in  a  minimax  sense. 
That  is,  to  minimize  the  estimating  loss  provided  the  inci¬ 
dental  parameters  attain  the  worst  sequence  of  values  (see 
[4]). 


This  is  a  time  varying  extension  of  the  M  estimates,  first 
presented  by  Huber  [5].  If  the  sequence  cr\,. . .  ,a2N  were 
known  the  cost  function  may  be  chosen  to  be  the  likelihood 
function.  Other  functions  may  also  be  considered.  Differ¬ 
entiating  V-Vi(-)  with  respect  to  6  the  estimates  are  obtained 
as  solutions  of  the  estimating  equations, 

X  Vn  (yn  -  Sn(6))  =  <>,  (5) 

1=1 

where  <£„(•)  =  If  the  cost  function  is  chosen  to  be 

the  log  likelihood  function,  then  the  estimating  function  is 
the  associated  score  function.  As  suggested  by  Huber  [5], 
the  choice  of  the  nonlinear  function  <pn  should  be  guided 
not  only  by  performance  considerations  but  also  by  consid¬ 
erations  of  robustness.  Huber  suggested  robustness  to  de¬ 
viations  from  the  assumed  noise  distribution.  Instead,  we 
suggest  the  choice  of  estimating  functions  robust  to  the  un¬ 
known  non-stationarity,  i.e.,  the  unknown  sequence  of, 

We  now  turn  to  compute  the  asymptotic  performance  of 
the  estimates  as  a  function  of  the  estimating  functions  tpn . 

4.1.  Small  error  analysis 


3.  THE  CRAMER  RAO  BOUND 


For  the  problem  of  estimating  the  parameters  of  a  determin¬ 
istic  signal  in  non  stationary  noise  (1),  the  Fisher  Informa¬ 
tion  matrix  (F/M)  is  a  block  diagonal  matrix: 


1(0,  <?!...<,%)  = 


m  o 

0  I(a  1...<j2n) 


(2) 


Therefore,  the  CRB  for  the  structural  parameters  is 


CRB{0)  =  r\o) 


(Y'  1  dsn(8)  dsn{6)T\ 

96  de  )  ■ 

(3) 


As  shown  in  [3],  for  the  problem  of  interest,  ML  estimates 
do  not  exist.  Therefore,  the  CRB  does  not  describe  the 
asymptotic  covariance  of  the  M L  estimates.  However,  it 
may  serve  as  a  lower  bound  for  any  estimator. 


The  small  error  behavior  of  the  estimates  is  determined  from 
a  first  order  expansion  of  the  estimating  equations  (5)  and  is 
derived  in  [3],  Under  mild  conditions,  it  can  be  shown  that 
for  small  errors: 


(0-6) 


A-1 


N 

X  Fn  K) 

i- 1 


dsn(9 ) 

ee 


where  A  is  defined  by 
N 

A  =  YJEv’n(Vn) 

i=  1 


dsn(0)  ds„(0) 

00  O0T  ' 


(6) 


(7) 


Each  term  in  the  stochastic  expansion  (6)  depends  on  a  sin¬ 
gle  value  of  vn  so  that  it  is  a  sum  of  independent  terms  be¬ 
cause  the  noise  sequence  itself  is  modeled  as  independent. 
It  is  then  clear,  that  if  Eipn  (v„ )  =  0  then  the  small  error 
bias  is  zero.  Moreover,  the  covariance  of  the  small  estima¬ 
tion  error  is  then  given  by: 


4.  GENERALIZED  M-ESTIMATES 

In  the  following  we  study  estimating  equations  depending 
on  the  parameters  of  interest  only.  We  consider  estimates 
of  the  signal  parameters,  0,  obtained  by  minimizing  a  cost 
function  consisting  of  a  sum  of  the  time  varying  functions, 
V'n(-): 

N 

6  =  arg  min  V  V’n  ( Vn  ~  sn(0)) .  (4) 

&  ‘  * 

i-1 


Cov 


«- 


A~l .  (8) 


u=i 


4.2.  Example 

One  example  for  a  generalized  M  type  estimator  (for  the 
structural  parameters)  is  the  weighted  least  squares  estima- 
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tor  ( WLS ).  In  this  case  the  cost  function  is  given  by: 


Vn  sn(6) 

kn 


i.e.,  ipn(-)  =  {f-)2  where  kn  are  predetermined  weights. 
The  estimating  functions  are  given  by  the  following  set  of 
nonlinear  equations: 

V'  (  Vn~  S„(0)'\  8sn(9) 


i.e.,  ip„(x)  =  p->  SO  ip'n  =  p-,  and  the  asymptotic  covari¬ 
ance  of  the  this  M  type  estimator  is  given  by: 

/  *\  (^idSn(0)dsn(e)TY1 

CovW  =  — sr  )  (11) 


(^aldsn(0)  8sn(6)T\  1  dsn(0)dsn(0)T\ 

K  86  86  j  kl  86  86  ) 

As  seen  from  equation  (11),  the  performance  of  the  WLS 
estimator  depends  on  the  noise  sequence  of , . . . ,  0%,  and 
on  the  predetermined  weights,  ki, ... ,  k n.  If  the  variance 
sequence  happens  to  be  equal  to  the  weighting  sequence 
squared,  such  that  ct%  =  k„,  then 


1  ^n(g)^„(g)Ty 

\h"n  90  96  )  '  ■ 


which  is  exactly  the  CRB  for  this  sequence.  This  result  is 
in  agreement  with  the  fact  that  the  weighted  least  squares 
estimator  is  the  ML  estimator  when  the  noise  is  non  sta¬ 
tionary  with  a  known  noise  variance  sequence.  It  shows 
that,  for  a  specific  noise  variance  sequence,  an  M  type  esti¬ 
mator  may  be  optimal  and  attain  the  CRB.  Actually,  in  this 
problem  we  see  that  for  any  noise  variance  sequence,  there 
exists  an  M-type  estimator  that  attains  the  CRB.  This  es¬ 
timator,  however,  cannot  be  constructed  in  advance  since 
it  requires  prior  knowledge  of  a2.  That  is,  an  M-type  es¬ 
timator  is  optimal  for  a  specific  sequence  of  the  nuisance 
parameters  but  is  suboptimal  for  other  sequences.  None  of 
the  M-type  estimators  are  uniformly  optimal  for  any  noise 
sequences.  Therefore,  as  first  suggested  by  Huber  [5],  the 
choice  of  the  estimating  functions,  <pn,  should  be  guided  by 
robustness  considerations. 


5.  SIMULATIONS  -  A  SINUSOIDAL  SIGNAL  IN 
NOISE 


where  A,  <f>  and  u>  are  the  amplitude,  the  phase  and  the  fre¬ 
quency  of  the  signal,  respectively.  vn  is  a  zero  mean  white, 
non  stationary,  Gaussian  noise,  i.e.,  vn  ~  Af  (0, &„).  In  the 
following  we  describe  simulation  results  for  the  M  estima¬ 
tors  of  section  4.2. 

We  have  considered  the  case  where  the  frequency,  u>  = 
0.27T,  is  assumed  known,  the  amplitude,  A  =  6,  and  the 
phase,  <j>  =  0,  are  to  be  estimated. 

Computer  simulations  of  three  kinds  of  generalized  M- 
type  estimators  were  carried  out:  all  of  them  are  weighted 
least  square  estimators,  with  some  weighting  sequence  kn, 
n  -  1,2 For  the  three  cases,  the  choice  of  the 
weighting  sequence  is: 

•  kn  =  crn.  This  is  the  case  where  the  weighting  se¬ 
quence  happens  to  be  the  sequence  of  the  (unknown) 
variance.  Since  theory  shows  that  the  performance  of 
this  estimate  reaches  the  CRB,  it  is  referred  to  as  the 
’’optimal”  WLS. 

•  kn  =  1.  This  classical  LS  estimator  is  the  optimal 
estimator  for  the  case  of  a  stationary  noise. 

•  Arbitrary,  non  constant  kn.  This  case  is  referred  to  as 
the  suboptimal  WLS. 

The  sequence  of  the  values  of  the  unknown  noise  vari¬ 
ance,  a2n,n  =  1, 2, . . . ,  IV,  was  generated  randomly  from 
Chi  squared  distribution  with  10  degrees  of  freedom1.  The 
weights  of  the  suboptimal  estimator  were  chosen  as  follows. 
For  the  first  100  samples,  the  weights  were  chosen  such  that 
kn  =  an  +  1.  For  the  rest  of  the  samples,  the  weights  were 
chosen  such  that  kn  =  |1  -  an\.  It  is  anticipated  that  by  this 
construction,  the  suboptimal  estimator  will  be  better  then 
the  least  squares  estimator  in  the  first  100  samples  (since  it 
gives  more  weight  to  samples  with  small  variance  and  less 
weight  to  samples  with  high  variance).  In  the  next  100  sam¬ 
ples,  the  performance  of  the  suboptimal  W LS  is  expected 
to  be  ruined,  since  its  weights  are  inversely  related  to  the 
noise  variance. 

1 500  Monte  Carlo  runs  have  been  carried  out.  The  (log) 
mean  squared  error  of  the  estimates  of  the  unknown  ampli¬ 
tude  and  phase,  respectively,  as  a  function  of  the  number  of 
samples,  N,  is  depicted  in  figures  1 ,  2. 

The  simulations  results  show: 

1 .  The  theoretical  expression  for  the  asymptotic  perfor¬ 
mance  of  the  estimates  show  good  matching  with  the 
empirical  results. 


A  problem  which  appears  in  many  applications  is  the  esti¬ 
mation  of  parameters  of  a  sinusoidal  signal  in  noise.  Con¬ 
sider  the  model: 

yn  —  A  sin  (um  +  </>)+  vn  (13) 


2.  As  expected,  the  performance  of  the  optimal  W LS  is 
always  the  best,  and  coincides  with  the  CRB. 

'Our  results,  however,  are  not  limited  to  the  case  where  the  sequence 
of  the  incidental  parameters  indeed  comes  from  a  distribution. 
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Figure  1:  The  MSE  of  the  different  estimates  for  the  ampli¬ 
tude  A  versus  the  number  of  samples. 


Figure  2:  The  MSE  of  the  different  estimates  for  the  phase 
<t>  versus  the  number  of  samples. 


3 .  The  performance  of  the  LS  and  the  (subopti  mal)  WLS 
can  be  better  or  worse  than  that  of  the  LS,  depending 
on  the  specific  sequence  of  the  values  of  the  noise 
variance. 

6.  SUMMARY 

In  this  paper  we  present  a  different  approach  to  model  non 
stationary  processes.  Instead  of  modeling  the  non  stationar- 
ity  in  a  parametric  way  with  a  small  number  of  parameters, 
we  propose  to  model  it  by  unknown  parameter  set  whose 
dimension  increases  with  the  number  of  observations.  First, 
we  study  general  flaws  of  such  modeling.  Then,  focus  is  set 
on  the  problem  of  estimating  the  parameters  of  a  determin¬ 
istic  signal  in  white  additive  non  stationary  Gaussian  noise. 

For  the  problem  of  interest,  we  propose  the  use  of  gen¬ 
eralized  M  type  estimates.  The  resulting  estimates  are  ana¬ 
lyzed  and  their  asymptotic  performance  is  evaluated  analyt¬ 
ically. 

The  WLS  estimator  is  used  as  a  demonstrating  exam¬ 
ple  for  an  M  type  estimator.  This  robust  estimator,  which 
is  consistent  under  relatively  mild  conditions,  is  analyzed. 
Its  asymptotic  performance  is  evaluated  analytically  and  is 
compared  with  simulation  results  through  the  problem  of 
estimating  the  parameters  of  a  sinusoid  in  non  stationary, 
white,  Gaussian  noise. 
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Abstract 

AC  power  lines  have  been  considered  as  a  conve¬ 
nient  and  low-cost  medium  for  intra-building  automa¬ 
tion  systems.  In  this  paper,  we  investigate  the  problem 
of  estimating  the  channel  order  and  root  mean  squared 
(RMS)  delay  spread  associated  with  the  power  lines, 
which  are  channel  parameters  that  provide  important 
information  for  determining  the  data  transmission  rate 
and  designing  appropriate  equalization  techniques  for 
power  lines  communications  (PLC).  We  start  by  show¬ 
ing  that  the  key  to  the  RMS  delay  spread  estimation 
problem  is  the  determination  of  the  channel  order,  i.e., 
the  effective  duration  of  the  channel  impulse  response. 
We  next  discuss  various  ways  to  estimate  the  impulse 
response  length  from  a  noise-corrupted  channel  esti¬ 
mate.  In  particular,  four  different  methods,  namely  a 
signal  energy  estimation  (SEE)  technique,  a  general¬ 
ized  Akaike  information  criterion  (GAIC)  based  test,  a 
generalized  likelihood  ratio  test  (GLRT),  and  a  mod¬ 
ified  GLRT,  are  derived  for  determining  the  effective 
length  of  a  signal  contaminated  by  noise.  These  meth¬ 
ods  are  compared  with  one  another  using  both  simu¬ 
lated  and  experimentally  measured  power  line  data. 
The  experimental  data  was  collected  for  power  line 
characterization  in  frequencies  between  1  and  60  MHz. 

1  Introduction 

AC  power  lines  have  been  considered  as  a  convenient 
carrier  for  communications  in  home  and  building  au¬ 
tomation  systems  [1],  However,  power  lines  present  a 
hostile  environment  for  data  transmission  due  to  vari¬ 
able  attenuation  and  impedance,  impedance  modula¬ 
tion,  impulse  noise,  and  continuous- wave  jamming  [1]. 
Recent  applications  of  spread  spectrum  and  forward 
error  correction  techniques  to  power  line  communica¬ 
tions  (PLC)  have  been  quite  successful  in  removing  or 
alleviating  the  noise  impediment  [1],  Extensive  char- 

*This  work  was  supported  in  part  by  the  National  Science 
Foundation  Grant  MIP-9457388,  the  Office  of  Naval  Research 
Grant  N00014-96-0817,  and  the  Swedish  Foundation  for  Strate¬ 
gic  Research  through  a  Senior  Individual  Grant. 


acterizations  of  power  lines  have  also  been  reported 
[2],  However,  these  studies  are  mainly  focused  on  fre¬ 
quencies  up  to  500  KHz.  For  video  transmission  and 
other  similar  applications  using  PLC,  it  is  needed  to 
characterize  the  PLC  channel  in  the  frequency  range 
of  1  ~  60  MHz. 

In  this  paper,  we  are  interested  in  the  channel  order 
and  root  mean  squared  (RMS)  delay  spread  of  power 
lines,  which  are  important  parameters  that  affect  the 
data  transmission  rate  over  the  channel.  In  practice, 
the  PLC  channel  impulse  response  may  be  measured 
using  some  channel  sounding  techniques.  The  mea¬ 
sured  channel  impulse  response  can  be  considered  as 
the  true  impulse  response  contaminated  by  measure¬ 
ment  noise  which  does  not  vanish  with  time.  With  this 
observation,  one  needs  to  determine  from  the  noise- 
contaminated  channel  estimate  the  effective  duration 
of  the  impulse  response,  outside  which  the  measure¬ 
ments  are  primarily  attributed  to  the  noise.  In  this 
paper,  we  present  a  number  of  methods  to  solve  the 
above  effective  signal  length  estimation  problem.  The 
performance  of  the  various  methods  are  compared  us¬ 
ing  both  simulated  and  experimentally  measured  data. 

2  Problem  Formulation 

The  impulse  response  of  an  AC  power  line,  s(n), 
is  assumed  to  have  a  finite  duration  M  [3]  [4],  Given 
s(n)  and  the  (effective)  signal  length  M ,  the  associated 
mean  delay  and  RMS  delay  spread  normalized  with 
respect  to  the  sampling  interval  can  be  determined  as 


E"  i  ns2(n) 
Zn=l°2(n)  ’ 


(1) 


and,  respectively, 


^RMS  = 


(2) 


In  practice,  s(n)  can  be  estimated  by  measuring 
the  frequency  response  of  the  power  line  channel  us¬ 
ing  some  channel  sounding  technique  and  applying  the 
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inverse  Fourier  transform  to  the  estimated  frequency 
response  [3].  The  resulting  signal  x(n)  can  be  consid¬ 
ered  as  an  estimate  of  s(n)  contaminated  by  noise: 

x(n)  =  s(n)  +  e(n),  n  =  l,2,...,N,  (3) 

where  e(n)  denotes  the  estimation  error  which  is  mod¬ 
eled  as  a  zero-mean  white  Gaussian  noise  with  un¬ 
known  variance  a2  and  is  assumed  to  be  independent 
of  s(n)  [3]  [4],  and  N  is  chosen  such  that  N  M . 

The  problem  of  interest  here  is  to  estimate  the  channel 
order  M  and  the  RMS  delay  spread  (Trms  from  the. 
measurements 

Supposing  first  that  M  is  known,  we  can  esti¬ 
mate  {s(n)}^.1  by  the  maximum  likelihood  (ML)  tech¬ 
nique.  Specifically,  the  negative  log-likelihood  function 
of  {a;(n)|  is  (see,  e.g.,  [5]) 

VM=  f  ln<^+  [*(»») -*(»)]2+ 

N  2/  \\  (4) 

Ln=M+ 1  x  \n)j  +  constant. 

The  the  ML  estimates  of  a 2  and  s(n )  are  obtained  by 
minimizing  (4)  with  respect  to  the  unknown  parame¬ 
ters,  which  are  given  by  a2  =  T  En=M+i  x2(n)  and 
s(n)  =  k(u),  n  =  1, . . . ,  M,  respectively.  Thus, 

min  Vm  =  ^ln<r^  +  constant.  (5) 
{»(«)}, 2 

If  M  is  known,  we  can  replace  s(n)  in  (1)  and  (2)  by 
s(n)  to  obtain  an  estimate  of  the  RMS  delay  spread. 
The  remaining  question  is  how  to  estimate  the  signal 
length  M,  which  is  discussed  next. 

3  Signal  Length  Estimation 

3.1  SEE  Based  Test 

Choose  a  sufficiently  large  L  such  that  M  <  L  <  N . 
Let  Eln  denote  the  total  average  energy  of  {®(n)}^=i, 

be.,  Eln  =  Yln=L  E  [a;2(n)]  =  (N  —  L  +  1)<t|,  where 
E\]  denotes  the  expectation.  The  total  noise  average 

energy  Ee  is  Ee  =  E»=i  e  [e2(n)]  =  N-l+ieln~ 
Let  Ex  denote  the  total  average  energy  of  x(n),  i.e., 

Ex  =  EnLi  E  [a:2(n)] .  The  total  deterministic  sig¬ 
nal  energy  is  obtained  as  Es  =  E^=i  «2(a)  =  Ex  — 
Ee.  In  practice,  Ex  and  Eln  can  be  estimated  as 
Ex  =  En=ia;2(n)  and  Eln  =  J2n=Lx2(n)’  respec¬ 
tively.  It  follows  that  an  estimate  of  Es  is  E,  = 
Ex  —  ■jV_NL+l  Eln-  The  proposed  SEE  test  consists  of 
the  following  steps: 

Step  1:  Calculate  Es. 

Step  2:  Set  M  =  1  and  E's  —  0. 


Step  3:  Compute  E's  =  E's  +  x2(M)  -  jf- .  Here,  the 
updated  E't  is  the  estimated  total  deterministic  signal 
energy  up  to  time  index  M . 

Step  4:  If  E's  >  kE,  or  M  =  L,  then  the  signal 
length  estimate  MsEE^is  equal  to  M  and  the  test  stops; 
otherwise,  set  M  =  M  +  1  and  go  to  Step  3.  Here,  k 
is  a  parameter  of  user  choice,  typically  0.9  <  k  <  0.99. 

The  SEE  test  is  a  method  based  on  intuitive  calcu¬ 
lations  of  the  signal  and  noise  energies.  It  is  simple  but 
with  a  somewhat  limited  capability. 

3.2  GAIC  Based  Test 

We  describe  here  how  to  adopt  the  generalized 
Akaike  information  criterion  (GAIC),  original  used  for 
model  structure  selection  in  system  identification  [5], 
to  determine  the  effective  signal  length  M .  The  GAIC 
cost  function  has  the  form  [5]: 

GAICj^  =  Vjft  +7ln(lnJV)(M  +  1),  (6) 

where  V ^  is  defined  in  (5).  M  is  assumed  to  be  the 
signal  length  [(M  +  1)  is  thus  the  total  number  of  un¬ 
known  parameters  for  the  data  model  in  (3)],  and  y  is 
a  parameter  of  user  choice.  The  proposed  GAIC  based 
test  determines  Mqni c  by  the  following  steps: 

Step  1:  Choose  a  sufficiently  large  L  so  that  M  < 
L  <  N. 

Step  2:  Calculate  the  cost  function  GAICjjj  for  M  = 

1,2 ,...,!. 

Step  3:  The  GAIC  estimate  of  M  is  obtained  as 
^GAIC  =  argminGAICjtf,  M  =  1,2,...,L.  (7) 

M 

Remark  1 :  As  one  may  have  noticed,  using  either 
the  SEE  or  GAIC  based  test  for  determining  M  in¬ 
volves  user  parameters,  viz  «  in  SEE  and  the  y  in 
GAIC,  which  may  affect  the  accuracy  of  the  signal 
length  estimate,  but  whose  choice  is  not  easy.  Specifi¬ 
cally,  making  a  choice  of  these  parameters  to  achieve  a 
certain  probability  of  detection  (or  missing)  is  not  re¬ 
ally  possible.  It  would  be  desirable  to  derive  methods 
that  can  somehow  control  the  risk  of  making  a  wrong 
decision.  Such  methods  should  be  of  greater  interest 
in  real  applications. 

3.3  GLRT 

The  generalized  likelihood  ratio  for  testing  M  =  M 
against  M  =  M  +  K  (for  some  K  >  1)  is  given  by  [5] 

A  =  N  In  [  — ^ — 1  (g) 

.J2n=iti+K+ 1  x2(n)_ 

For  N  1  and  under  the  hypothesis  Ho  :  M  >  M, 
it  can  be  shown  that  A  is  x2  distributed  with  K  degrees 
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Rectangular  Pulse 


Sinc-like  Pulse  Having  a  Raised  Cosine  Spectrum 


K=4  (rectangular  pulse) 


Figure  1:  Tests  signals  used  in  the  simulated  examples. 


Figure  2:  RMSE  of  <trms  for  tl'e  rectangular  pulse 
versus  cr 2  when  N  —  450,  L  =  200,  and  K  =  4. 


of  freedom,  denoted  by  A  ~  y2(/v ).  To  see  this,  we 
rewrite  A  as  follows: 


N  In 


1  + 


em+*L,  *2(«) 


N^>M+K 


N- 


Zn=iti+K+i  x2(n ) 

rl,  *2(n) 
n=M+\  ^  ' 


zL 


M+K+ 1 


x2(n) 


(9) 


Let  u-e  =  F  ZLti+K+i  *2(”)-  Here>  is  an  estimate 
of  <r2.  Note  that  for  iV  >  1,  <r2  «  <r2.  In  view  of  this 
observation,  we  have  (under  Ho) 


(io) 

The  GLRT  for  determining  Mglrt 's  summarized  be¬ 
low: 

Step  1:  Choose  a  threshold  A  from  a  table  of  the 
X2  distribution  such  that  Pr  {y  <  A  \y  ~  x2(R)  }  =  a, 
where  0.9  <  a  <  0.99  (see  the  discussions  below). 
Step  2:  Set  M  =  1. 

Step  3:  Calculate  A  according  to  (8). 

Step  4:  If  A  <  A  at  M  and  also  A  <  A  is  true  in  more 
than  90a%  of  the  cases  corresponding  toM+l,M  + 
2,...,L  —  K,  then  Mglrt  =  M  and  stop;  otherwise, 
set  M  =  M  +  1  and  go  to  Step  3. 

A  brief  explanation  of  Step  4  is  as  follows.  It  should 
be  noted  that  K  is  a  small  integer,  typically  K  <  10 
(see  Section  3.4).  However,  a  small  K  may  be  a  bad 
choice  for  signals  that  are  small  over  some  intervals 
within  the  signal  duration.  When  M  happens  to  be  in 
one  of  those  intervals  and  also  K  is  too  small  to  include 
any  significant  signal  energy  in  the  denominator  of  (9), 
it  is  very  likely  that  the  inequality  A  <  A  will  be  true. 
Hence,  to  find  out  the  real  signal  boundary  one  has  to 


check  the  inequality  A  <  A  not  only  at  M  but  at  the 
rest  data  samples  as  well.  We  shall  keep  in  mind  that 
even  if  the  boundary  sample  has  been  hit,  A  <  A  may 
not  be  true  for  all  of  the  rest  data  samples  due  to  the 
random  nature  of  the  noise.  Nevertheless,  the  inequal¬ 
ity  should  hold  true  for  the  majority  (e.g.,  90a%)  of 
the  rest  data  samples  beyond  the  signal  boundary. 

Observe  that  the  risk  of  rejecting  H0  when  H0  holds 
(the  probability  of  false  alarm)  equals  1  —  a.  In  gen¬ 
eral,  the  risk  of  accepting  H0  when  it  is  not  true  cannot 
be  determined  for  the  statistics  introduced  previously 
unless  one  restricts  considerably  the  class  of  alterna¬ 
tive  hypotheses  against  which  H o  is  tested.  Thus,  in 
applications  the  value  of  a  or,  equivalently,  the  test 
threshold  A  is  chosen  by  considering  only  the  probabil¬ 
ity  of  false  alarm.  Doing  so,  we  shall  keep  in  mind  that 
as  a  increases,  the  probability  of  false  alarm  decreases, 
but  the  other  type  of  risk  increases.  Typically,  a  is 
chosen  between  0.9  and  0.99  [5]. 

Remark  2:  It  should  be  noted  that  the  above  GLRT 

£.  2 

is  a  valid  test  only  when  N  — ►  oo.  Additionally,  cre  is 
a  poor  estimate  of  <r2  if  N  is  not  large  enough,  partic¬ 
ularly  so  if  M  + 1<  +  1  <  M.  It  would  be  of  interest  to 
modify  the  GLRT  somehow  such  that  the  above  prob¬ 
lems  are  avoided.  Such  a  modified  GLRT  indeed  exists, 
as  discussed  next. 

3.4  Modified  GLRT 

As  mentioned  before,  ae  is  not  a  good  estimate  of 

cr2.  A  better  estimate  is  <re  =  jfhiZn=L+ 

where  M  <  L  <  N .  We  now  replace  the  &e  in  (10)  by 
-  2 

the  above  u e  and  define 

,  A  N-L  Ef=+#+1*»  A  N-L  pi 

K  'E^i-’OO  * 
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K=4  (sine- like  pulse) 


Figure  3:  RMSE  of  <trms  for  the  sinc-like  pulse  versus 
a2e  when  N  =  450,  L  =  200,  and  I<  =  4. 


Under  the  hypothesis  H0  and  if  M  +  K  <  L,  A  is  F 
distributed  with  K  and  N  —  L  degrees  of  freedom  [5], 
written  as  A  ~  F(K,  N  —  L).  Observe  that  this  holds 
in  finite  samples,  whereas  most  other  tests,  including 
the  original  GLRT,  require  N  —>  oo. 
u  The  choice  of  K  should  be  made  carefully.  For 
M  >  M,  this  is  perhaps  not  very  important  since  any 
K  >  1  will  lead  to  a  similar  performance.  For  M  <  M , 
however,  the  choice  becomes  more  critical.  To  reduce 
the  risk  of  underestimating  M,  a  small  K  is  recom¬ 
mended.  To  see  this,  let  us  assume  that  K  is  very 
large  such  that  K  M .  Then  underestimating  M  by 
1  or  2  will  not  increase  p\  too  much  (particularly  so 
if  M  is  small),  and  hence  the  risk  of  underestimating 
M  may  be  large.  As  a  result,  a  smaller  K  in  this  case 
should  be  used.  However,  as  mentioned  in  Section  3.3, 
a  small  K  is  a  bad  choice  for  signals  that  are  small 
over  certain  intervals  within  the  signal  duration,  and 
therefore  a  similar  strategy  may  be  used  as  adopted 
there.  In  particular,  the  modified  GLRT  determines 
-^mGLRT  in  the  following  steps: 

Step  1:  Choose  K  <  10  and  a  thresh¬ 

old  6  from  a  table  of  the  F  distribution  so  that 
Pr  {z  <  6  |z  ~  F(K,  N  —  L)  }  =  a,  where  a  is  between 
0.9  and  0.99. 

Step  2:  Calculate  A  for  M  =  1, 2, . . . ,  L  —  K. 

Step  3:  The  signal  length  estimate  MmGLRT  is  the 
smallest  M  at  which  A  <  6  is  true  and  for  which  the 
inequality  is  also  true  in  more  than  90a%  of  the  cases 
corresponding  to  M  +  1,  M  +  2, . . . ,  L  —  K. 

4  Numerical  Results 

In  the  following,  we  use  k  =  0.96  for  the  SEE  test, 
7  =  2  for  GAIC,  K  =  4  and  a  =  0.99  for  GLRT  and  the 
modified  GLRT  (referred  to  as  mGLRT  henceforth). 


4.1  Simulated  Example 

The  simulated  data  consists  of  a  pulse  having  a  cer¬ 
tain  shape  corrupted  by  a  zero-mean  white  Gaussian 
noise  with  variance  a2.  We  consider  both  a  rectan¬ 
gular  pulse  (M  =  40)  and  a  sinc-like  pulse  having  a 
raised  cosine  spectrum.  The  roll-off  factor  for  the  lat¬ 
ter  is  1.  The  sinc-like  pulse  is  shifted  and  truncated 
to  have  a  duration  of  80  samples.  Figure  1  shows  a 
realization  of  the  test  data  corresponding  to  the  two 
different  pulses  when  <r2e  =  0.05,  where  dashdot  lines 
represent  noise-free  signals  and  solid  lines  denote  noise- 
contaminated  signals,  respectively.  The  results  shown 
below  are  obtained  using  200  Monte-Carlo  trials.  For 
each  individual  trial,  a  total  number  of  N  =  450  sam¬ 
ples  are  used  and  L  =  200.  In  this  example,  we  in¬ 
vestigate  the  effect  of  the  noise  variance  on  the  perfor¬ 
mance  of  the  proposed  tests.  The  root  mean  squared 
error  (RMSE)  of  are  shown  in  Figure  2  for  the 
rectangular  pulse  and  Figure  3  for  the  sinc-like  pulse. 
The  results  show  that  in  general  mGLRT  and  GAIC 
perform  better  than  the  other  two  methods.  The  poor 
performance  of  GLRT  is  due  to  that  a  number  of  out¬ 
liers,  i.e.,  the  signal  length  estimates  are  equal  to  L, 
are  obtained  by  GLRT. 

4.2  Experimental  Example 

We  now  consider  an  experimental  example.  We  first 
briefly  describe  the  PLC  channel  sounding  system  used 
to  obtain  the  measurement  data.  For  more  details  of 
the  system  and  measurement  process,  we  refer  the  in¬ 
terested  readers  to  [3].  Figure  4  shows  a  block  diagram 
that  uses  impulse  channel  sounding  to  measure  the  im¬ 
pulse  response  of  the  AC  power  line  channel.  The  cou¬ 
pler  box  plugging  into  the  AC  wall  outlet  (the  top  path 
in  Figure  4)  behaves  like  a  highpass  filter,  with  the  3 
dB  cutoff  at  1  MHz.  The  probing  signal  passes  through 
the  coupler  and  the  AC  power  line  network  and  exits 
through  a  similar  coupler  plugged  in  a  different  outlet. 
A  direct  coupler  to  coupler  connection  is  used  to  cali¬ 
brate  the  test  setup  (the  bottom  path  in  Figure  4).  A 
low-noise  amplifier  (LNA)  with  at  least  54  dB  gain  is 
used  in  front  of  the  digital  storage  oscilloscope  (DSO) 
to  reduce  the  noise  figure  and  increase  the  sensitivity 
of  the  system.  The  LNA  has  a  built-in  lowpass  fil¬ 
ter  with  the  3  dB  cutoff  frequency  at  70  MHz  in  the 
front  stage.  Additionally,  a  high-precision  adjustable 
(0-80  dB)  attenuator  is  placed  after  the  receiving  cou¬ 
pler,  making  it  possible  to  center  the  dynamic  range  of 
the  LNA/DSO  combination  for  the  signal  level  of  each 
outlet  pair.  This  allows  the  system  to  capture  noise 
spikes  and  temporal  noise  fluctuations.  The  DSO  has 
a  bandwidth  of  500  MHz,  implying  a  high  resolution, 
and  the  capability  for  long  time  captures. 

The  probing  impulse  used  is  a  specially  truncated 
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Figure  4:  Power  line  channel  measurement  system. 


sine  waveform,  with  a  duration  of  17  ns  and  a  flat  fre¬ 
quency  characteristics  from  0.85  to  63.6  MHz.  The 
highpass  characteristics  of  the  couplers  and  the  low- 
pass  filter  in  the  LNA  limit  the  receiving  sensitivity  of 
the  system  to  the  1  to  60  MHz  frequency  band.  The 
sampling  frequency  is  1  GHz  and  the  total  number  of 
data  samples  is  N  =  20000.  The  measurements  were 
performed  at  two  residential  houses  by  averaging  over 
100  to  1000  scope  sweeps  depending  on  the  noise  sit¬ 
uation.  Figure  5  shows  the  impulse  response  of  a  spe¬ 
cific  power  line  channel  corresponding  to  the  frequency 
band  1  —  60  MHz.  For  channel  order  and  RMS  delay 
spread  estimation,  we  choose  L  =  N/ 2.  The  effective 
signal  length  estimates  obtained  by  the  four  tests  un¬ 
der  study  are  also  shown  in  the  figure.  We  notice  that 
GLRT  fails  again  since  the  GLRT  estimate  is  an  out¬ 
lier,  with  a  value  equal  to  L.  It  is  also  seen  that  SEE 
obviously  underestimates  the  effective  signal  length. 
On  the  other  hand,  the  estimates  given  by  GAIC  and 
mGLRT  appear  to  be  more  accurate.  After  obtain¬ 
ing  the  effective  signal  length  estimate,  we  can  use  (1) 
and  (2)  to  calculate  the  mean  delay  and  RMS  delay 
spread.  Specifically,  the  RMS  delay  spread  estimates 
for  the  1  ~  60  MHz  frequency  band  obtained  by  SEE, 
GAIC,  and  mGLRT  are  0.19,  0.27,  and  0.28  pis,  respec¬ 
tively.  With  no  equalization,  the  maximum  transmis¬ 
sion  rate  is  inversely  proportional  to  the  RMS  delay 
spread,  i.e. ,  Maximum  Transmission  Rate  sw  • 

It  follows  that  the  maximum  data  transmission  rate 
is  approximately  2.63  Mbps.  The  above  calculation 
is  somewhat  optimistic  since  other  factors,  such  as  at¬ 
tenuation  and  noise  characteristics  of  the  PLC  channel, 
which  are  important  in  determining  the  transmission 
rate,  were  not  counted.  Additionally,  the  impulse  re¬ 
sponses  were  obtained  using  one  specific  set  of  mea¬ 
surements.  It  is  our  experience  that  the  RMS  delay 
spread  could  vary  significantly  depending  on  the  loads 
and  environment  of  the  power  lines  networks. 
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ABSTRACT 

Many  signal  processing  applications  require  estimating  and 
tracking  a  quantity  that  is  inherently  nonstationary.  Such 
quantities  may  be  matrices  (e.g.,  a  covariance  matrix  or  an 
image),  vectors  (e.g.,  a  weight  vector  or  an  eigenvector),  or 
scalars.  This  paper  considers  the  use  of  Taylor  series  expansions 
to  enhance  tracking.  The  potential  benefits  of  this  approach 
include:  (1)  a  reduction  in  computational  burden,  (2)  a 

reduction  in  required  memory  size  and/or  communication 
bandwidth  (via  an  implicit  compression  of  the  quantity  of 
interest),  (3)  interpolation  through  “gaps”  in  the  available  data, 
and  (4)  increased  fidelity  due  to  the  explicit  incorporation  of 
“nonstationarity”  into  the  model.  Sensor  array  processing 
examples  are  used  to  illustrate  the  approach. 

1.  INTRODUCTION 

Many  signal  processing  applications  require  estimating  and 
tracking  a  quantity  that  is  inherently  nonstationary.  Typically, 
sensor  data  is  collected  and  used  to  form  an  initial  estimate  of  the 
quantity.  Later,  more  sensor  data  is  collected  and  a  new  estimate 
is  formed.  The  process  is  repeated,  thereby  “tracking”  the 
quantity  of  interest. 

There  are  several  problems  with  this  “estimation  -  then  -  re¬ 
estimation”  approach.  First,  the  estimation  procedure  is  often 
computationally  intensive.  Computing  estimates  may  take  a  very 
long  time  and  may  be  difficult  to  implement  in  real-time.  Once 
an  initial  estimate  is  found,  it  is  thus  desirable  to  have  an 
efficient  method  for  tracking  the  quantity  (e.g.,  via  recursive  least 
squares  approaches  or  “fast”  subspace  tracking). 

Second,  the  estimation  procedure  is  typically  formulated  on 
the  assumption  that  the  sensor  data  is  stationary  within  the 
“training”  interval  (i.e.,  the  interval  used  for  creating  the 
estimate).  Often  this  is  only  approximately  true,  and  thus  may 
degrade  performance. 

Third,  there  may  be  gaps  in  the  available  data  that  make 
tracking  difficult  or  impossible. 

Finally,  the  tracked  quantity  must  often  be  stored. 
Sometimes,  it  must  also  be  transmitted  over  a  wireless  link  (e.g., 
for  remote  processing).  In  this  case,  frequent  re-estimation  can 
drive-up  the  storage  requirements  and/or  link  bandwidth. 
Compression  of  the  quantity  is  thus  a  desirable  alternative; 
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however,  this  is  typically .  only  done  after  the  estimates  are 
formed  (not  as  an  integral  part  of  the  tracking  algorithm). 

As  an  illustration  of  a  representative  application,  consider 
sensor  array  processing  (e.g.,  as  applied  to  radar,  sonar,  or 
wireless  communcations).  Typically,  sensor  array  snapshots  are 
used  to  estimate  a  covariance  matrix,  R  ,  its  inverse,  R1 ,  its 
principal  subspace,  V„ ,  an  image,  a  spectrum,  or  other 
quantities.  Such  quantities  are  used  in  adaptive  beamforming, 
adaptive  Doppler  filtering,  STAP,  MUSIC,  ESPRIT,  MVDR, 
and  other  super-resolution  methods. 

When  the  covariance  matrix  is  desired,  its  maximum 
likelihood  estimate  (presuming  stationary  interference)  is 
typically  used: 

(1) 

A  k= 1 

where  \{k)  is  the  k*  array  snapshot.  When  principal 

components  are  needed,  the  eigenvalue  decomposition  of  R  is 
first  calculated, 

R  =  VDV"  .  (2) 

The  principal  subspace,  V, ,  can  then  be  found  by  extracting  the 
columns  of  V  associated  with  the  s  largest  eigenvalues  (i.e.,  the 
s  diagonal  elements  of  D  above  the  noise  floor). 

In  nonstationary  environments,  such  quantities  are  functions 
of  time,  i.e.,  R(f),  R  1  (t ) ,  and  V„  (r)  respectively.  Estimates  of 
these  quantities  are  typically  made  via  the  standard  estimators 
(i.e.,  (1)  and  (2))  applied  to  a  training  interval  consisting  of 
nearby  snapshots,  e.g.  (for  odd  K)\ 

K- 1 

R(')=7t  ]£ *(*)*"  (*)  (3) 

A  K- 1 

k-t - 

2 

R(0=V(f)D(/)V"(f)  (4) 

As  t  varies,  tracking  is  achieved  by  recalculating  these 
estimators.  Some  efficient  techniques  exist  for  updating  these 
quantities  as  long  as  there  is  a  large  degree  of  overlap  between 
successive  training  intervals.  In  practice,  this  may  not  be  the 
case.  Furthermore,  the  data  within  the  training  interval  may  be 
nonstationary. 

In  Section  2,  we  consider  nonstationary  models  for  the 
estimated  quantities.  We  investigate  ways  that  Taylor  series 
expansions  may  be  employed  in  the  tracking  of  these  estimated 
quantities.  In  Section  3,  we  consider  nonstationary  models  for 
the  underlying  quantities.  We  investigate  ways  that  Taylor  series 
expansions  may  be  used  to  estimate  these  underlying  quantities. 
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Our  approach  explicitly  models  the  nonstationarity,  and  can 
result  in  better  performance  than  standard  estimators.  Section  4 
contains  a  summary. 

2.  TRACKING  ESTIMATED  QUANTITIES 

As  an  introduction,  let  us  consider  how  Taylor  series 
expansions  can  be  applied  to  the  problem  of  tracking  estimated 
quantities  (note:  the  more  interesting  problem  of  tracking 
underlying  quantities  is  considered  in  the  next  section).  Suppose 

A (t)  is  an  estimate  of  a  quantity  of  interest.  Without  loss  of 

generality,  let  the  dimensions  of  A(f)  be  NxP.  The  Taylor 

series  expansion  of  A(f)  about  t  =  0  is: 

A(r)=  A(0)+  rA(0)+ . . .  +  —  A(o)+  H.O.T.  (5) 

n! 


Examples: 

Next,  let  us  illustrate  the  usefulness  of  (8)  for  tracking 
estimated  quantities.  Suppose  we  have  a  uniform  linear  array  of 
20  half-wavelength  spaced  isotropic  elements.  Two 
nonstationary  signal  sources  are  present;  their  initial  angles  are 
6.418°  and  -18.998°  relative  to  array  broadside  (this  a  separation 
of  10  beamwidths — where  a  “beamwidth”  is  defined  here  as  the 
angle  between  the  peak  and  3  dB  points  of  the  beam  pattern). 
They  each  have  an  SNR  of  30  dB.  A  total  of  99  samples  are 
collected  (i.e.,  t  varies  from  -49  to  49);  each  source  moves  5 
beamwidths  during  this  time  (thus,  the  first  source  passes  through 
0°  at  t  =  0). 

Suppose  we  wish  to  track  the  spectrum  (a.k.a.  periodogram) 
vs.  time, 

P(@,t)=  \(&Y  R(r)v(0)  (9) 


where  A(o)  denotes  the  first  derivative,  A(o)  denotes  the  nth 
derivative,  and  “H.O.T."  denotes  higher  order  terms  of  the  series. 


Suppose  estimates  of  this  quantity  are  made  at  f, ,  t2 ,...,  tE  . 
A  system  of  nlh  order  Taylor  Series  approximation  equations  can 
then  be  written: 


(6) 


A(0) 

A(f, ) 

(T®W) 

A(0) 

= 

a  y 

1 

>>  = 

Is— ^ _ 

_A(f£  ) 

where  ®  denotes  the  kronecker  product,  and 


T  = 


1 

1  h 
1  t , 


n\ 

iL 

n! 


‘E 

n\ 


(7) 


Solving  this  system  of  equations  will  provide  estimates  of 

.  n 

A(o),  A(o), ...,  A(o).  Observe  that  a  solution  is  given  by: 


A(0) 

Afc)' 

A(0) 

=  ((twt)",tw®iNxA, 

Afc) 

A(0)_ 

Me) 

In  the  event  that  the  system  of  (6)  is  overdetermined  or 
underdetermined,  (8)  provides  the  least  squares  solution.  Also 
note  that  the  psuedo-inverse  in  (8)  can  often  be  pre-computed,  or 
written  in  closed  form.+ 


+  In  some  cases,  (8)  can  then  be  re-factored  so  as  to  combine  the  two 
terms  on  the  right  directly,  rather  then  separately  forming  the  A(r,)’s 

and  (W^T"  . 


where  v(0)  is  the  array  response  vector  for  a  source  at  angle  0. 
Figure  la  shows  the  true  instantaneous  periodogram  calculated 
from  (9).  In  practice,  the  true  instantaneous  covariance  matrix, 
R(0>  would  be  unknown  and  must  be  estimated.  Thus,  we 
would  actually  compute: 

p(0j)=v(efR(t)\(e)  (io) 

where  R(f)  is  an  estimate  of  the  covariance  matrix  as  in  (3). 
Since  the  sources  are  moving,  the  sample  averaging  in  (3)  will 
introduce  an  error.  Thus,  we  should  like  to  use  a  small  K  (but  if 
K  is  too  small,  the  spectrum  will  be  noisy  and  sources  may  be 
incorrectly  estimated).  Figure  lb  shows  the  estimated 
periodogram  calculated  from  (10)  and  (3)  with  K  =  1.  As 
expected,  this  estimate  is  very  noisy  and  its  peaks  are  frequently 
in  the  wrong  places.  To  address  this.  Figure  lc  shows  the 
periodogram  when  K  =  21.  Here,  the  covariance  matrix  and 
periodogram  are  re-estimated  at  each  time,  t  (except  near  the 
ends,  where  the  desired  training  interval  falls  outside  the 
available  data). 

Finally,  Figure  Id  shows  the  estimated  periodogram  that 
results  when  Taylor-series  approximations  are  used.  Here,  five 
sample  covariances  were  formed  using  (3)  (with  K  =  21  and  t  = 
11,  30,  49,  69  and  87,  respectively).  Then,  we  used  (8)  and  (5) 

(with  A  (r, )  replaced  by  R(f, ) ,  E  =  5,  and  n  =  4)  to  synthesize  all 
of  the  R(f)  values  used  in  (10). 

Observe  that  the  Taylor-series  method  performed  very 
similar  to  the  “estimate  -  then  -  re-estimate”  approach  shown  in 
Figure  lc  (even  though  we  only  formed  5  sample  covariances 
from  the  data!).  In  fact,  the  measured  root-mean-squared-error 
(RMSE)  between  the  peaks*  of  Figure  lc  and  Figure  la 
(excluding  the  invalid  t’s  near  both  ends)  was  0.21°.  By 
comparison,  the  measured  RMSE  between  the  peaks  of  Figure  Id 
and  Figure  la  was  0.18°  within  this  same  time  interval.  (Outside 
this  interval,  the  Taylor  series  approach  appears  to  also  provide  a 
reasonable  extrapolation!). 


*  The  search  for  peaks  was  conducted  on  a  0.1  °  grid. 
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M{6,t).  Here,  we  used  E  =  40  (i.e.,  we  skipped  every  other  t) 
and  n  =  7.  Clearly,  the  method  provides  an  adequate 
interpolation  at  the  missing  r’s,  though  the  sidelobes  are 
noticeably  higher. 
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Figure  2.  Taylor  Series  approximation  to 
periodogram,  with  gap  in  available  data. 


Figure  1.  Periodograms.  (a)  P(0,/).  (b)  P( 9,f) 
using  (3)  and  K  =  1.  (c)  P(0,/)  using  (3)  and  K  =  21. 
(d)  P(0,f)  using  Taylor-series  approximations. 


It  is  perhaps  worth  noting  that  our  Taylor-based  estimates 
may  be  used  directly  in  their  “series  form.”  For  example,  the 
periodogram  above  may  be  written: 

p(e,t)=  v(e)"  r(o)v(0)+...+— v(e)"  R(o)v(e).(i  i) 

n\ 

This  quantity  is  efficiently  updated  at  multiple  t’s  by  calculating 
the  quadratic  terms  only  once,  and  then  forming  different 
weighted  combinations  at  each  different  t.  Moreover,  storage 
and/or  transmission  (over  a  wireless  link)  of  the  result  is 
simplified  by  maintaining  the  series  form.  Of  course,  the  same 
ideas  apply  to  other  quanties  as  well  (e.g.,  the  computation  of 
adaptive  beamformer  weights  via  Taylor  series  expansion  of 

r  hr ). 


As  a  second  illustration,  consider  Figure  2.  Here,  we 
illustrate  the  effect  of  a  gap  in  the  sensor  data.  Snapshots  at 
t  =  -10, ...,0  are  removed.  Sample  covariance  estimates  (with  K 
=  21)  are  formed  at  t  =  -39  ...  -21,  1 1  ...39.  Then,  we  solve  for 
the  Taylor  series  terms  and  use  them  to  synthesize  covariance 
estimates  at  all  t  (including  the  gap  region).  Comparing  Figure  2 
and  Figure  lc,  performance  is  obviously  pretty  good. 


As  a  final  illustration,  we  applied  Taylor  series  to  the 
modeling  of  the  estimated  MVDR  spectrum. 


l 

v(0rR(t)-‘v(<?) 


(12) 


Figure  3a  show  the  spectrum  using  the  usual  sample  covariance 
estimates  (with  K  =  21,  and  M(0j)  recomputed  at  each  t). 
Figure  3b  shows  the  spectrum  resulting  from  Taylor  expansion  of 


'  Of  course,  we  could  just  expand  P(d,t)  directly,  instead  of  R(t) . 


(a)  (b) 
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Figure  3.  MVDR  Spectrum  (a)  Estimated  every  t  via 
(3)  with  K  =  21  (b)  Estimated  by  Taylor  series  with 
many  small  gaps  between  measurements. 

3.  TRACKING  UNDERLYING 
QUANTITIES 

In  Section  2,  we  used  Taylor  series  expansions  to  model 
estimated  quantities,  e.g.,  A (r).  Our  model  was  then  used  to 
interpolate,  extrapolate  and  smooth  our  estimates  (while  also 
potentially  reducing  computation,  data  storage  and/or 
communication  bandwidth). 

To  summarize  Section  2,  our  basic  procedure  was  to: 

(1)  Initially  estimate  A(rf)  at  several  f , , 

(2)  Solve  for  the  terms  in  the  Taylor  series  expansion 
of  A(/) 

(3)  Use  these  terms  to  interpolate,  smooth,  etc. 

Looking  back  at  Section  2,  we  may  now  ask  the  question:  “Does 
this  procedure  make  logical  sense?”  Conceptually,  the  procedure 

correctly  regards  A(f)  as  being  “nonstationary”  during  step  2  - 
the  computation  of  the  Taylor  series  expansion  terms.  However, 
step  1  leads  to  a  bit  of  a  dilemma.  In  the  scenarios  that  interest 
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us  most  (and  probably  many  other  scenarios  as  well),  the 
available  procedures  for  initially  estimating  A(t; )  are  implicitly 
based  upon  an  assumption  that  the  training  data  is  stationary. 
Consider,  for  example,  the  sample  covariance  estimate,  ,  in 
(3).  In  the  sensor  array  processing  community,  this  estimator  is 
very  widely  used.  It  has  the  property  of  being  a  maximum 
likelihood  estimator  when  the  observations,  x(t) ,  are  stationary. 
However,  it  is  also  frequently  used  in  situations  where  the  data  is 
not  stationary.  Could  we  do  better? 

3.1  Tracking  R(f)  or  P(Q,t) 


To  answer  this  question,  let  us  begin  by  expanding  the 
“underlying”  (i.e.,  true  instantaneous)  covariance  matrix: 


R(f  )=  R(0)+  fR(0)+ . . .  ■ + — R(0)+  H.O.T. 


(13) 


Next,  use  this  to  expand  the  expected  value  of  the  sample 
covariance  estimator  in  (3), 


A  k-\ 

-R(0)*,R(0)+...  +  i'f  +  H.O.T. 


K  ,  Ha  n! 

2 


(14) 

From  (14),  we  can  create  a  system  of  n'h  order  Taylor  series 
approximation  equations: 


(f®IA 


r(p) 

’eto} 

R(0) 

= 

R(0) 

T&f.. 

rq* 

(15) 


where 


T  = 


1  t, 


1  t , 


l  tK 


_i_  y  r_ 

K  k- i  n\ 

*='■— 

x-i 

f2  + - 

j_  y  — 

K  ,  Ha  n! 


2 

ir-i 


j_  y  *1 

K  ,  if_i  n\ 


(16) 


The  least  squares  solution  to  (15)  is  then  given  by: 


R(oJ 

'4(q 

R(0) 

4(4 

R(0) 

\  J 

c*] 

(17) 


Hence  if  we  could  solve  equation  (17),  we  could  then  use  (13) 
(with  terms  above  order  n  truncated)  to  estimate  the  underlying 
function  R(t).  In  practice,  the  expected  values  on  the  right  side 
of  (17)  are  unknown.  Instead,  we  will  replace  these  expectations 
by  single  realizations.  That  is,  we’ll  solve: 


R(o) 

r(o) 

=  jy"f)T"®l4 

R(0 

rW 

R(0) 

Me) 

After  solving  for  R(o),  R(o),  ...,  R(o)  in  (18)  we’ll  use 
them  in  (13)  to  synthesize  interpolated,  smoothed,  or 
extrapolated  values  for  R(f). 

Example: 

Let  us  next  illustrate  the  benefit  of  tracking  the  underlying 
R(t)  via  (18)  and  (13).  Assume  the  same  array  scenario  as  in 
Section  2.  A  single  source  is  present  and  moving  in  a  nonlinear 
fashion.  Its  true  angle  is: 

(19) 

where  £  =  (2/99)°.  As  earlier,  a  total  of  99  snapshots  are 
available. 

Figure  4a  shows  the  true  instantaneous  periodogram 
calculated  from  (9).  In  contrast,  Figure  4b  shows  the  estimated 
periodogram  calculated  from  (10)  and  (3)  with  K  =  21.  Observe 
how  the  sample  averaging  now  causes  large  errors.  We  wish  to 
compensate  these  errors  in  the  neighborhood  of  t  =  0.  Figure  4c 
attempts  this  by  using  the  method  of  (18)  and  (13).  Here,  K  = 
21,  E  =  79,  n  -  9.  The  RMSE’s  were  evaluated  for  the  methods 
of  Figure  4b  and  Figure  4c,  and  averaged  over  the  region  t  =  -10 
...  10.  A  total  of  100  independent  trials  were  performed  and  the 
average  RMSE  was  found  to  be  0.6416°  for  the  method  of  Figure 
4b,  and  0.4701°  for  the  method  of  Figure  4c.  Hence,  the 
covariance  estimation  method  of  (18),  (13)  has  resulted  in  a  27% 
reduction  in  the  angle  estimation  error.  Incidentally,  Figure  4d 
shows  the  method  when  K  =  1  (in  this  limit,  equations  (8)  and 
(18)  are  the  same). 

3.2  Tracking  Vs(?) 

As  a  final  case  study,  let’s  consider  V,(f).  Using  the 
method  of  Section  2,  we  can  create  Taylor  series  approximations 
to  Vs(r)'\  However,  we  would  prefer  to  track  the  underlying 
V,  (f )  and  thus  compensate  for  the  averaging  used  in  forming  the 

estimate  R(t).  How  do  we  do  this? 

Let  us  begin  by  expanding  Vg(f)  in  a  Taylor  series: 

V,  (r)=  V„  (0)+  fV8  (o)+ . . .  +  ^  V,  (0)+  H.O.T.  (20) 


**  Note:  the  basis  vectors  synthesized  in  this  manner  will  not  generally 
be  orthonormal. 
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Figure  4.  Periodograms.  (a)  (b)  p(0,t) 

using  (3)  and  K  =  21.  (c)  Taylor-series  approximation 
that  compensates  for  nonstationarity  near  t  =  0.  (d) 
Taylor-series  approximation  with  K  =  1,  n  =  14. 
{Note:  30  dB  color  scale  similar  to  Fig.  3} 


Column  i  holds  the  r'lh  unit-normalized  basis  vector: 


v.,(0=  v,,(0)+rvs.(0)+...  +  i-vs,(0)+f/.O.r.  (21) 

n\ 

By  definition,  v,4(t)  maximizes 

y.^tf  E{i(t)xH(t)}\„(t)  (22) 

over  the  set  of  all  vectors  orthogonal  to  vsil(r),  u e  {l,...,i'-l}. 
Thus,  we  wish  to  approximately  maximize 

v8/£^(r)x"(r)}v.,.  (23) 

where  vsi  =  vs,(o)+...  +  — v,,,(o).  Equation  (23)  can  be 

nl 

rewritten  as: 

<4(0y"(0K,,  (24) 


and 


where  <,  =  v"(o)  rv"(o)  ^-v.,,(o) 

nl 

y"(f)=[x"(0  x"  (t)  xH(r)] .  This,  in  turn,  is  equal  to: 

t,"  4(0*"  (0k,  (25) 


where  tf,  = 


v"(o)  KM  -  v.„(o) 


and 


e"(r)= 


x"(0  «"(0  -  V(0 

III 


Substituting  the  extended 
sample  covariance  estimate  [1]  for  the  expectation,  we  have: 

(26) 


k(0=-^r  i*(0*"(0  (27) 

2 

This  suggests  the  following  estimation  and  tracking 
procedure.  First,  find  the  principal  components  of  the  extended 
sample  covariance  matrix  R£(t).  Next  partition  them  into 

vm(°)>  ''».i(0)>  •••  and  vSl,(o).  Finally  use  (21)  (truncated 
beyond  the  nlh  term)  to  estimate  v,  f  (t  ) . 

Example: 

Using  the  signal  model  from  Figure  4,  we  calculated  the 
sample  covariance  matrix  (i.e.,  equation  (3)  with  K  =  21)  and  its 
principal  eigenvector  at  each  t.  Then,  at  each  t,  we  computed  the 
angle  between  this  eigenvector  and  the  principal  eigenvector  of 
R(0-  This  is  shown  as  the  solid  curve  in  Figure  5. 

Next,  we  examined  the  procedure  of  (26)  and  (21)  (with,  K 
=  21,  and  n  =  20).  At  each  t,  an  estimate  was  formed  by 
numbering  the  snapshots  (surrounding  the  rth  snapshot)  from  -10 
...  10.  Then  the  procedure  of  (26)  and  (21)  was  used  to  estimate 
v, i(  (o) .  This,  in  turn,  was  used  directly  as  our  estimate  for  the 

subspace  at  time  t.  As  above,  the  angle  between  our  subspace 
and  the  principal  eigenvector  of  the  true  covariance  matrix  was 
computed  at  each  t.  Figure  5  compares  the  two  methods. 
Observe  that  the  Taylor-based  method  provided  a  superior 
estimate  of  the  signal  subspace  in  this  highly  nonstationary  signal 
environment. 


Figure  5.  Angle  between  true  and  tracked  subspaces 

4.  SUMMARY 

This  paper  has  proposed  a  methodology  that  we  have  called 
‘Taylor  series  adaptive  processing.”  In  this  methodology, 
quantities  of  interest  are  expanded  in  terms  of  their  Taylor  series, 
and  the  terms  of  the  series  are  calculated  from  the  data.  This 
provides  a  means  to  efficiently  track,  compress,  and/or 
interpolate  quantities  (e.g.,  adaptive  statistics),  as  well  as 
improved  estimates  in  nonstationary  environments. 


[1]  S.  D.  Hayward,  “Adaptive  Beamforming  for  Rapidly  Moving 
Arrays,”  1996  CIE  Int.  Conf.  on  Radar,  pp.  480-483. 
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Adaptive  Bayesian  Signal  Processing 
-  A  Sequential  Monte  Carlo  Paradigm  * 

Xiaodong  Wang  *  and  Rong  Chen  *  and  Jun  S.  Liu  § 


Abstract 

We  provide  a  general  framework  for  using  Monte  Car¬ 
lo  methods  in  dynamic  systems  and  discuss  its  wide  ap¬ 
plications  in  adaptive  signal  processing.  All  of  these 
methods  are  partial  combinations  of  three  ingredients: 
important  sampling  and  resampling,  rejection  sampling 
and  Markov  chain  iterations.  Examples  from  target 
tracking  and  digital  communication  applications  are 
provided  to  demonstrate  the  effectiveness  of  these  novel 
statistical  signal  processing  techniques. 


Yt=(yo,  yi,  •  •  ■ ,  yt)-  Suppose  that  at  time  t,  we  are  in¬ 
terested  in  making  some  inference  about  the  state  vari¬ 
able  xt  based  on  Yt,  which  is  essentially  computing 
£'{h(a:t)|lrt}  =  f  h(xt)p(xt\Yt)dx,,  for  some  function 
h(-).  In  most  cases  an  exact  evaluation  of  this  expec¬ 
tation  is  analytically  intractable,  due  to  the  complex¬ 
ity  of  such  a  dynamic  system.  Monte  Carlo  methods 
provide  a  viable  alternative  to  facilitate  such  computa¬ 
tions.  Specifically,  if  we  can  draw  m  random  samples 
{i^JyLi  from  the  distribution  p(xt\Y t),  then  we  can 
approximate  E{h(xt\Y t)}  by 


1.  BACKGROUND  ON  SEQUENTIAL 
MONTE  CARLO  METHODS 

In  this  section,  the  general  framework  of  sequential 
Monte  Carlo  methods  for  updating  a  dynamic  system  is 
described.  Of  particular  interest  is  the  mixture  Kalman 
filtering  technique  for  on-line  estimation  of  condition¬ 
al  linear  dynamic  models,  which  is  especially  useful  for 
adaptive  Bayesian  signal  processing  in  time-varing,  non¬ 
linear  and  non-Gaussian  environment. 

1.1.  Sequential  Monte  Carlo  Filtering 
Consider  the  following  dynamic  system  modeled  in  a 
state-space  form  as 

state  equation  it  = /t(it-i,  «t)  /.\ 

observation  equation  yt  =  fft(it,  »t) 

where  it,  yt,  «t  and  vt  are  respectively  the  state  vari¬ 
able,  the  observation,  the  state  noise  and  the  observa¬ 
tion  noise  at  time  t.  Denote  X t=(io,  ii,  ■  ■  ■  ,  *t)  and 

'This  work  was  supported  in  part  by  the  Interdisci¬ 
plinary  Research  Initiatives  Program,  Texas  A&M  Univer¬ 
sity.  R.  Chen  was  supported  in  part  by  the  U.S.  National 
Science  Foimdation  (NSF)  under  grant  DMS-9626113  and 
grant  DMS-9982846.  X.  Wang  was  supported  in  part  by  the 
NSF  grant  CAREER  CCR-9875314.  J.S.  Liu  was  supported 
in  part  by  the  NSF  grant  DMS-9803649. 

1  Department  of  Electrical  Engineering,  Texas  A&  M  U- 
niversity,  College  Station,  TX  77843. 

^Department  of  Statistics,  Texas  A&M  University,  Col¬ 
lege  Station,  TX  77843. 

§  Department  of  Statistics,  Stanford  University,  Stanford, 
CA  94305. 


E{h(xt)\Yt}  S  (2) 

J=1 

Very  often  directly  sampling  from  p(xt\Yt)  is  infeasi¬ 
ble,  but  drawing  from  some  trial  distribution  q(xt\Y t) 
is  easy.  Suppose  that  a  set  of  samples  {x\3^ }  are  gen¬ 
erated  instead  from  a  trial  distribution  q{xt\Y t).  To  uti¬ 
lize  these  samples  for  making  inferences  about  the  target 
distribution  p(xt\Yt),  the  samples  have  to  be  weighted 
by  the  following  weights 


w 


U) 

t 


P(4J)  I  Yt) 
?(4J)  \Yt)' 


1,  2,  •  •  • ,  to. 


(3) 


Then  the  inference  E{xt\Y t)  can  be  approximated  by 


Ep{h(x,)\Yt}  a  i-  h(4j)Wt3)-  (4) 

3- 1 


The  pair  j  =  is  called  a  prop¬ 

erly  weighted  sample  with  respect  to  the  distribution 
p{xt\Yt). 

To  implement  Monte  Carlo  for  such  a  system,  ran¬ 
dom  samples  drawn  from  p(xt\Yt)  are  needed  at  any 
time  t.  A  sequential  Monte  Carlo  filter  (MCF)  for 
updating  the  dynamic  system  (1)  involves  generating 
properly  weighted  samples  {(i  «,W^)}f=1forp(it|yt) 
at  time  t,  based  on  the  properly  weighted  samples 
{(4-i-4-i)}£=i  for  p{xt-i\Yt-i)  at  time  (t  -  1), 


0-7803-5988-7/00/$  10.00  ©  2000  IEEE 
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according  to  the  following  algorithm  [Liu  and  Chen 
(1998)]. 

FOR  j  =  1,  •  •  • ,  m  DO 

1.  Draw  a  sample  x\3>  from  a  trial  distribution 

jfrix&.y,); 

2.  Compute  the  importance  weight  • 

P(x^\Yt)/  . 

The  samples  and  weights  at  time  t  are  used  to  ap¬ 
proximate  the  inference  E{h(xt)\Yt}  using  (4).  It  can 
be  shown  that  the  weighted  samples  generated  by  this 
algorithm  are  unbiased,  i.e., 

E\h{x[j))w[3)}  =  E{h{xt)\Yt}.  (5) 

Hence  by  the  law  of  large  numbers, 

m 

^  E{h(xt)  |  Yt}, 


There  are  many  important  issues  regarding  the  design 
and  implementation  of  a  sequential  MCF,  such  as  the 
choice  of  the  trial  distribution  </(•),  and  the  use  of  resam¬ 
pling.  Specifically,  the  most  efficient  choice  of  the  trial 
distribution  g(xt\X[3}1,  Yt)  for  the  state  space  model 
(1)  has  the  following  form 

,Yt)  =  p(xt\X[j}1,Yt) 

°C  p(yt  |  Xt)p(x,  I  *£,),  (7) 

For  this  trial  distribution,  the  important  weight  is  up¬ 
dated  according  to 

WU)  =  wO_)  P(X(tj)  I  Yt) 

*  1  P(X&\Yt-1)p(z\»\xV>1,Yt) 

oc  w(t3]1  ■  p(yt  I  Ipjj),  (8) 

1.2.  The  Mixture  Kalman  Filter 

Many  dynamic  system  models  belong  to  the  class  of 

conditional  dynamic  linear  models  (CDLM)  of  the  form 

xt  =  . 

yt  =  H\txt  +  K\tvt,  '  ' 

where  ut  ~  A/"(0,7),  vt  ~A/"(0,7)  (Here  7  denotes  an  i- 
dentity  matrix.),  and  At  is  an  indicator  random  variable. 
The  matrices  F\t ,  G\t ,  H\t  and  I(\t  are  known  con¬ 
stant  matrices  given  At.  It  is  apparent  that  in  CDLM, 
for  a  given  trajectory  of  the  indicator  At,  the  system 
is  linear  and  Gaussian,  for  which  the  Kalman  filter  pro¬ 
vides  the  complete  statistical  characterization  of  the  sys¬ 
tem  dynamics.  Recently  a  novel  sequential  Monte  Carlo 


method,  the  mixture  Kalman  filter  (MKF),  was  pro¬ 
posed  for  on-linear  filtering  and  prediction  of  CDLM, 
which  exploits  the  conditional  Gaussian  property  and 
utilizes  a  marginalization  operation  to  improve  the  al¬ 
gorithmic  efficiency.  The  MKF  samples  in  the  indicator 
space  and  uses  a  mixture  of  Gaussian  distribution  to 
represent  the  target  distribution.  Compared  with  other 
sequential  MCF  methods,  substantial  performance  gains 
can  be  achieved  by  the  MKF. 

Denote  Yt  -  (j/o,  Jfi,  •  •  • ,  yt)  and 

At  =  (Ao,  Ai,  •  •  • ,  At).  Specifically,  the  MKF  uses  the 
properly  weighted  discrete  samples  {(A^,  to 

represent  p(At\yt),  and  then  it  uses  a  random  mixture 
of  Gaussian  distributions  Yl’jLi  E^)  to  rep¬ 

resent  the  target  distribution  p(:tt|Yt),  where  == 
[/4'7\  E(J'1]  is  obtained  by  implementing  a  Kalman  filter 
for  the  given  sample  trajectory  A^.  The  MKF  for  up¬ 
dating  the  CDLM  involves  generating  random  samples 
{(A^\  k\*\  w[^)}jL1  at  time  t,  based  on  the  samples 

{(A|l\ ,  ,  w\32i)}jL i  at  time  ( t  —  1),  according  to  the 

following  algorithm  [Chen  and  Liu  (2000)]: 

FOR  j  =  1,  • ,  m  DO 

1.  Draw  a  sample  from  a  trial  distribution 

^AtlA^Jj.K^Jj,  Yt); 

2.  Run  a  one-step  Kalman  filter  based  on  A<», 
and  yt; 

3.  Compute  the  weight  tup  =  w[-t  ■ 

P( A& ,  A(tj)  |  Yt)/  [p(A«  I  Yt_i )  ff(A«  | Aft ,  ,  Yt 

The  MKF  can  be  extended  to  handle  the  so-called 
partial  CDLM,  where  the  state  variable  has  a  linear 
component  and  a  nonlinear  component. 

2.  EXAMPLES 
2.1.  Target  tracking 

Designing  a  sophisticated  target  tracking  algorithm  is 
an  important  task  for  both  civilian  and  military  surveil¬ 
lance  systems,  particularly  when  a  radar,  sonar,  or  op¬ 
tical  sensor  is  operated  in  the  present  of  clutter  or  when 
innovations  are  non-Gaussian  (Bar-Shalom  and  Fort- 
mann,  1988).  We  show  an  example  of  target  tracking 
with  maneuvering. 

This  situation  can  be  modeled  as  follows: 

xt  —  Hxt-i  +  Fut  +  Wwt 
yt  =  Gxt  +  Vvt 

where  ut  is  the  maneuvering  acceleration.  Here  we  con¬ 
sider  an  example  of  Bar-Shalom  and  Fortmann  (1988) 
in  which  a  two-dimensional  target’s  position  is  sampled 
every  T  =  10s.  The  target  moves  in  a  plane  with  con¬ 
stant  course  and  speed  until  k  =  40  when  it  starts  a  slow 
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90°  turn  which  is  completed  in  20  sampling  periods.  A 
second,  fast,  90°  turn  starts  at  k  =  61  and  is  completed 
in  5  sampling  times. 


The  slow  turn  is  the  result  of  acceleration  inputs  vf  = 
uyt  =  0.075  (40  <  t  <  60),  and  the  fast  turn  is  from 
uf  =  —  uvt  —  —0.3  (61  <  t  <  65).  Other  ut’s  are  zero 
(i.e.  no  maneuvering). 

To  apply  the  MKF  to  this  application,  we  need  to 
specify  prior  structure  of  it,.  First,  we  assume  that  ma¬ 
neuvering  can  be  classified  into  several  categories,  indi¬ 
cated  by  an  indicator.  In  particular,  we  assume  a  three 
level  model,  /,  =  0  indicates  no  maneuvering  (ut  =  0), 
and  It  =  1  and  2  indicate  slow  and  fast  maneuvering, 
respectively,  (it,  ~  iV(0,  of),  of  <  n\).  In  this  study 
we  used  of  =  1  and  of  =  36.  We  also  specify  tran¬ 
sition  probabilities  P(It  =  j  \  It- 1  =  0  =  Pij  for  the 
maneuvering  status.  Specifically,  we  assume  pa  =  0.8 
and  pij  =  0.1  for  i  ±  j  (i.e.  it  is  more  likely  to  stay 
in  a  particular  maneuvering  state  than  to  change  the 
maneuvering  state).  Second,  there  are  different  ways  of 
modeling  the  serial  correlation  of  the  ut .  Here  we  as¬ 
sume  a  multi-level  white  noise  model,  as  in  Bar-Shalom 
and  Fortmann  (1988),  where  the  ut  are  assumed  inde¬ 
pendent,  given  the  indicator.  This  is  the  easiest  but 
not  a  very  realistic  model.  Other  possible  models  are 
currently  under  investigation. 

In  Figure  1  we  present  the  root  mean  square  errors  of 
the  MKF  estimates  of  the  target  position  for  50  simulat¬ 
ed  runs.  Comparing  our  result  with  that  of  Bar-Shalom 
and  Fortmann  (1988,  pp  143)  who  used  the  traditional 
detection-and-switching  method,  we  see  a  clear  advan¬ 
tage  of  the  proposed  MKF. 

2.2.  Digital  Signal  Extraction  in  Fading  Chan¬ 
nels 

Many  mobile  communication  channels  can  be  modeled 
as  Rayleigh  flat-fading  channels,  which  have  the  follow¬ 
ing  form: 

{Xt  =  Fxt-i  +  Wwt 
at  =  G  xt 

St  ~  p( ■  I  St_l) 

Observation  Equation:  yt  =  a,s,  +  Vvt 

where  st  are  the  input  digital  signals  (symbols),  yt  are 
the  received  complex  signals,  and  at  are  the  unobserved 
(changing)  fading  coefficients.  Both  wt  and  vt  are  com¬ 
plex  Gaussian  with  identity  covariance  matrices.  This 
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Figure  1.  The  root  MSE’s  of  the  x-positiion  and  x- 
direction  velocity  of  50  runs  of  the  MKF  for  a  simulated 
two-dimensional  target  moving  system  with  maneuver¬ 
ing. 

system  is  a  clearly  a  PCDLM.  Given  the  input  signals 
st,  the  system  is  linear  in  xt  and  yt.  Consider  binary 
input  signals  st  =  {1,  —1}.  The  fading  coefficient  takes 
complex  values,  with  independent  real  and  imaginary 
parts  following  the  same  state  equation.  Both  of  the  re¬ 
al  and  the  imaginary  parts  of  at  follow  an  ARMA(3,3) 
process 

at  -  0.9391a, _i  +  2.8763a, _2  -  2. 9372a, _3 

=  0.0376e,  +  0.1127e,_i  +  0.1127e,_2  +  0.0376e,_3 

where  et  ~  jV(0, 0.012).  In  the  communication  liter¬ 
ature,  this  is  called  a  (lowpass)  Butterworth  filter  of 
order  3  with  cutoff  frequency  0.01.  It  is  normalized  to 
have  a  stationary  variance  1. 

We  are  interested  in  estimating  the  differential  code 
dt  =  stst—i .  Figure  2  shows  the  bit  error  rate  of  differ¬ 
ent  signal  to  noise  ratios  (SNR),  using  EMKF,  the  dif¬ 
ferential  detection  dt  =  sign(real{yty*-i))  and  a  lower 
bound.  The  lower  bound  is  obtained  using  the  true  fad¬ 
ing  coefficients  a,  and  dt  =  sign(real(a*ytyt-iO!t-i)). 
The  Monte  Carlo  sample  size  m  was  100  for  the  MKF. 
We  also  include  the  result  of  a  delayed  estimation,  in 
which  st  is  estimated  using  the  samples  s[^  generated 
by  the  MKF,  and  the  weight  at  time  1  +  1  (Liu  and 
Chen  1998).  This  delayed  estimation  is  able  to  utilize 
the  substantial  information  contained  in  the  future  in¬ 
formation  yt+i,  and  hence  is  more  accurate  due  to  the 
strong  memory  in  the  fading  channel. 
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We  can  see  that  the  simple  differential  detection  works 
very  well  in  low  SNR  cases  and  no  significant  improve¬ 
ment  can  be  expected.  However,  it  has  an  apparent  bit 
error  rate  floor  for  high  SNR  cases.  The  MKF  managed 
to  break  that  floor,  by  using  the  structure  of  the  fading 
coefficients.  The  details  of  this  treatment  can  be  found 
in  Chen,  Wang  and  Liu  (2000). 


Figure  2.  The  bit  error  rate  of  extracting  differen¬ 
tial  binary  signals  from  a  fading  channel  using  differen¬ 
tial  detection,  the  MKF  and  the  delayed  MKF.  A  lower 
bound  that  assumes  the  exact  knowledge  of  the  fading 
coefficients  is  also  shown. 


coefficients  are  part  of  the  state  variable,  and  are  lin¬ 
ear  conditional  on  the  digital  signal  xt.  Liu  and  Chen 
(1995)  studied  this  problem  with  a  procedure  which  is 
essentially  an  extended  MKF1.  This  PCDLM  formula¬ 
tion  can  be  easily  extended  to  deal  with  a  blind  decon¬ 
volution  problem  with  time-varying  system  coefficients. 
In  figure  3  we  plot  the  channel  estimates  as  a  function 
of  time  for  a  static  4-tap  complex  ISI  channel.  It  is  seen 
that  the  channel  can  be  tracked  quickly. 


2.3.  Blind  Deconvolution. 

Consider  the  following  system  in  digital  communication 

9 

Vt  =  +et , 

1=1 

where  st  is  a  discrete  process  taking  values  on  a  known 
set  5.  In  a  blind  deconvolution  problem,  st  is  to  be  es¬ 
timated  from  the  observed  signals  {jn, . . . ,  jt},  without 
knowing  the  channel  coefficients  8i.  This  system  can 
be  formulated  as  a  PCDLM.  Let  9t  =  [On,. . . , 8tq)  and 
Xt  =  [st, . . . ,  st_9)'.  We  can  define 

State  Equation: 

Observation  equation:  yt  =  9txt  +  et 

where  H  is  a  q  x  q  matrix  with  lower  off-diagonal  ele¬ 
ment  being  one  and  all  other  elements  being  zero  and 
W  =  (1,  0, . . . ,  0)'.  In  this  case,  the  unknown  system 


f  9t  =  9t-i 
\  xt  =  Hxt- 1  +  Ws t 


Figure  3.  Convergence  of  the  blind  equalizer  based  on 
sequential  imputation. 
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ABSTRACT 


2.  THE  QQ-PLOT  TECHNIQUE 


We  present  a  new  algorithm  for  the  estimation  of 
probability  density  functions  (pdfs).  This  founds  a  large 
number  of  applications  in  the  context  of  statistical  signal 
processing  problems,  such  as  detection,  estimation, 
filtering  or  pattern  recognition  and  classification.  Our 
approach  relies  on  the  QQ-plot  technique.  The  estimates 
of  the  first  and  second  order  statistics  of  the  observed 
random  data  are  used  together  with  a  suboptimal  piece- 
wise  linear  approximation  of  the  QQ-plot,  yielding  a  new 
class  of  pdfs  estimators.  We  describe  the  algorithm  and 
test  it  in  comparison  with  other  techniques,  showing  that 
our  approach  provides  better  results. 

1.  INTRODUCTION 

In  most  signal  processing  problems  related  to 
communications  and  radar/sonar  systems,  such  as 
detection,  estimation,  filtering  or  pattern  recognition  and 
classification,  the  observed  signals  are  corrupted  by  noise. 
In  those  problems,  the  measurement  signals  can  often  be 
expressed  as  the  sum  of  the  information  bearing  signal 
and  the  measurement  noise.  Usually,  the  noise  statistics  or 
its  distribution  are  assumed  known.  In  many  situations, 
due  to  a  diversity  of  well  established  modeling  issues,  the 
measurement  noise  is  assumed  Gaussian.  Based  on  this, 
optimum  processing  techniques  are  derived  and  the 
properties  of  the  resulting  algorithms  can  be  easily 
studied.  However,  if  the  actual  probability  density 
function  (pdf)  of  the  noise  is  not  Gaussian,  or  if  its 
statistics  are  unknown,  the  performance  of  those 
algorithms  will  degrade  significantly,  even  if  the 
“distance”  between  the  actual  noise  distribution  and  the 
Gaussian  one  is  small.  To  overcome  noise  modeling 
mismatches,  and  in  order  to  recover  the  overall 
performance  of  the  optimum  processors,  efficient 
estimators  of  the  noise  pdf  must  be  used.  In  this  paper,  we 
will  introduce  a  new  approach  to  the  pdf  estimation 
problem,  which  is  based  on  the  QQ-plot  technique  [3], 


The  probability  density  function  (pdf)  of  a  random  process 
can  be  estimated  from  a  sequence  of  samples.  Such 
estimates  are  required  to  determine  the  conditional  rate  of 
failure  in  reliability  theory  or  the  decision  functions  in 
unsupervised  pattern  classification  problems,  and  in 
adaptive  filtering  problems  [1,2,5].  One  way  of  finding 
the  appropriate  solution  is  to  look  for  functional 
approximations  as 

p{x)  =  p{x)  =  X  c,cp,  (*)  =  c7<pM  ( 1 ) 

1=1 

where  c  is  a  n-vector  of  unknown  coefficients,  (-)T  denotes 
the  transpose  operator,  and  cp,(x)  is  a  set  of  known 
functions  [2].  The  problem  is  to  find  the  vector  c  that 
minimizes  a  suitable  function  of  the  error 
Z(x)  =  p(x)-cTq>(x). 

The  proposed  method  is  related  to  the  well  known 
QQ-plot  statistical  technique,  which  is  described  here.  Let 
us  consider  the  samples  [x,},i  =  1,...,/  of  a  random 
variable  X  with  pdf  p(x) .  By  ranking  the  samples  {x,}  we 
obtain  the  nondecreasing  sequence  {y,.),i  =  1,...,/ ,  where 
y.  <  y.  for  any  (i,  j) :  i<  j  .  The  probability  that  some 

observation  y  will  have  rank  1  in  the  ordered  sequence 
{y,.}  is  given  by  [2,4] 


P(i/y)  = 


/-] 
V1-1  J 


|pw(y)[l -/>(?)]'-' 


(2) 


where  P(-  )  is  the  distribution  function  corresponding  to 
the  given  pdf  p( ■ ) .  Then,  the  conditional  expectation 
mily  and  the  conditional  variance  aj/y  of  the  random 
variable  i  given  the  sample  y  are,  respectively,  given  by 
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mily=E{i/y}=l  +  (I-l)P(y) 

(3) 

<4  =  £{'2  /y}~E2{i/y}  =  (/  -l)P(y)[l  -  P(y)] 

It  should  be  noted  that  a?,y  is  small,  meaning  that  the 
rank  i  of  the  random  variable  y  is  in  the  vicinity  of  mity, 
i.e.,  i  ~  mUy .  Then,  it  follows  from  (3)  that  the  relation 
between  i  and  P(y)  is  approximately  linear.  A  plot  of  the 
ordered  samples  y,  versus  the  quantity  P_1(p() ,  where 
P~'(-)  is  the  inverse  of  P(-  )  and  p,.  =  (r  — 1)/(7  —  1) ,  is 
the  QQ-plot  [3],  By  taking  mUy  ~i  in  (3),  one  obtains 
the  QQ-plot  relations 

y^p-’Cp^p-1  (/;.);  rt  =(t-0.5)//;t  =  l,2„..,/  (4) 

Thus,  if  the  QQ-plot  (4)  is  fairly  linear,  then  it  indicates 
that  the  samples  have  the  same  distribution  P(-) ,  even  in 
the  tails.  Relation  (4)  can,  therefore,  be  expressed  in  the 
linear  regression  form 

yi=m  +  aP~'  ( ri)  =  m  +  ari  (5) 

where  P0(-)  is  an  arbitrary  standard  distribution,  i.e.,  with 
zero  mean  and  unit  variance,  which  generates  the  random 
variables  ri=(yl -m)lc,  with  m  =  E{y }  and 

o2  =E{(y-m)2 1 .  Starting  from  (5),  one  can  estimate  the 

unknown  parameters  m  and  a  using  the  least-squares 
algorithm  [4], 
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Fig.l.  Standard  Gaussian  random  variable  samples 

In  order  to  illustrate  the  technique,  we  generated  1000 
samples  of  the  standard  Gaussian  random  variable 


(m  =  0,  a  =  l),  see  Fig.l.  The  corresponding  QQ-plot  is 
presented  in  Fig.  2.  By  applying  relation  (6),  we  obtained 
statistics  estimates  m  =  0.0057  and  d  =  1.0154.  The 
properties  of  these  estimators  can  be  found  in  [4], 


Fig.  2.  Corresponding  QQ-plot 


3.  PROPOSED  ALGORITHM 

In  this  section,  the  idea  of  obtaining  the  pdfs  estimate  p 

in  (1)  using  the  QQ-plot  method  is  elaborated.  To  do  this, 
we  have  to  segregate  n  linear  parts  of  the  given  QQ-plot , 
resulting  in  the  partition  of  the  data  set  in  n  mutually 

exclusive  subsets  with  cardinal  numbers  c(/;/  =  l . n . 

Then,  the  term  cp,()  in  (1)  can  be  interpreted  as  an 
approximation  of  the  conditional  pdf,  given  the  data 
points  from  the  i-th  subset,  while  the  coefficients  c, 
represent  the  a  priori  probability  of  the  observations  to 
take  values  in  the  r-th  subset.  Furthermore,  such  obtained 
linear  segments  are  defined  by  (5),  and  are  uniquely 
determined  by  the  first  and  second  order  moments  mi  and 
a i  of  (pd') ,  whose  respective  estimates  mt  and  d,  are 
given  in  (6).  The  key  step  in  the  proposed  procedure  is  to 
generate  an  optimal  piece-wise  linear  approximation  of 
the  QQ-plot.  Of  course,  this  is  a  nonclassical  optimization 
problem,  which  is  tractable  only  by  numerical  methods. 
Therefore,  we  propose  a  heuristic  procedure  that  yields  a 
suboptimal  solution. 

Let  A  denote  the  set  of  all  piece-wise  linear  functions  ci  ■) 
with  a  number  of  linear  segments  na ,  whose  domain  is 
the  interval  of  real  numbers  [P0'1(r,),P0'1(r,)] .  The  goal  is 
to  find  a  function  a  e  A  that  satisfies: 

a  =  argmin7(a)  =  argmin{Ji'sr(a,^)  +  yn  };  (7) 

aeA  a&A 
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where  dist(a,qq)  =  and 

y.  =qq(P0'(ri)),  i=  1,2 . /,  is  the  set  of  nondecreasing 

measurements,  y  being  a  properly  defined  parameter. 
The  influence  of  this  parameter  to  the  final  solution  is 
great.  Namely,  for  y  =  0  the  final  solution  a*  would  be  a 

piece-wise  linear  function  with  (/-l)  linear  segments  that 
would  approximate  QQ-plot  ideally.  On  the  other  hand, 
each  of  the  segments  would  contain  only  two  terminal 
measurements,  and  this  represents  a  small  amount  of  data 
for  proper  statistic  estimation.  On  the  contrary,  a  too  large 
y  will  cause  a  solution  a *  with  a  small  number  of 
segments,  resulting  in  a  rather  bad  approximation  of  the 
QQ-plot.  Thus,  parameter  y  has  to  be  chosen  as  a 
compromise  between  those  two  opposite  requirements.  A 
suitable  choice  is  ye  (0.01,0.05) .  Finally,  we  achieved 

convergence  towards  the  optimal  solution  a  (•)  through 
the  functional  sequence  |a;(-)j  obtained  on  the  following 


determined  as  the  ratio  between  the  number  of  data  points 
belonging  to  the  i-th  subset  and  the  total  number  of  data 
points,  while  the  parameters  m(  and  a,  are  estimated 
according  to  (6).  So,  the  pdf  approximation  p,  can  be 
found  in  a  kernel  form 


P(*)  =  2>,— Po(— — )  • 
m  a,.  a, 


(8) 


The  choice  of  p0(-)  depends  on  the  nature  of  the  data.  In 
the  case  of  symmetric  and  unimodal  pdf  of  the  samples 
sequence,  a  logical  choice  for  Po(-)  should  be  the  normal, 
uniform  or  Laplace  pdfs.  It  should  be  noted  that  for  the 
normal  /?„(•)>  the  proposed  algorithm  represents  a 
Gaussian  mixture  type  estimator  [5],  while  for  the 
uniform  p0(-) ,  it  can  be  viewed  as  a  kind  of  histogram 
method  with  variable  cells  [1], 


4.  AN  EXAMPLE 


manner: 

step  i)  in  the  initial  step  the  first  member  at  (•)  consists  of 
only  one  linear  segment  (n=l),  determined  by  the  first 
(Po'fa),  y,)  and  the  last  point  (P0~l(r, ),  y, )  of  the  QQ  plot; 

step  ii)  the  member  aJ+1(),  7=1,2,...,  is  formed  from 
cij(')  using  the  rule:  to  each  of  the  linear  segments  of 
ay0.  say  compute 

d{aJi,qq)=—%(aji(P0~'{rp))-qq(pArp)i?=li  - 

mji  p= i 

i  =  1  ,  where  mjt  denotes  the  number  of  the  QQ-plot 

points  covered  by  the  i-th  linear  segment  in  the  y'-th 
iteration.  Denote  the  maximal  value  with  l *  =  max  I, . 

i=\,...,k  j 

Now,  divide  the  segment  with  corresponding  /*  -measure 
into  two  linear  subsegments  so  that  these  subsegments 
contain  the  same  number  of  data  points.  So  obtained 
piece-wise  linear  approximation,  that  contains 
kj  + 1  =  kJ+l  segments,  forms  the  member  a;+i0 1 

step  iii)  if  dist(aj,qq)+yna.  >  dist(aj+],qq)+yna.^  ,  go  to 
step  ii),  else  finish  the  algorithm  and  stop  the  procedure 
with  a  =  aj . 

Having  finished  the  construction  of  the  piece-wise  linear 
approximation  a  of  the  QQ-plot  with  n  linear 
subsegments,  the  coefficients  c, ;  i  =  can  be 


In  order  to  illustrate  the  performance  of  the  proposed 
estimator,  we  select  as  an  example  the  Gaussian  mixture 
pdf 

p(x)  =  0.3  N{x/~  5,1 )+ 03N(x/~  1 .5,1 .5) 

+  0.4A(x/1.5,l)  ’ 


where  N(-/a,b)  denotes  the  normal  pdf  with  mean  a  and 
variance  b.  The  performance  of  the  algorithm  is  shown  in 
Figs.  3,  4. 


Fig.  3.a:  Histogram  method  with  variable  cells 


To  compare  our  parametric-nonparametric  approach  to 
both  a  nonparametric  and  a  parametric-nonparametric 
distinct  solutions,  we  also  have  implemented  the 
histogram  method  with  variable  cells  [1]  and  the  adaptive 


245 


Gaussian  mixture  method  [5],  The  results  were  obtained 
using  7=5000  samples  from  the  pdf  (9).  Estimation  results 
(Fig.  5)  are  compared  in  terms  of  the  integral  estimation 
error  (CEE): 

CEE{l )  =  JT  (p(x)~  p,  (x)f  dx  (10) 

where  7  denotes  the  number  of  data  samples  used  for 
estimation.  The  numerical  complexity  of  the  estimators  is 
analyzed  and  expressed  through  the  number  of  floating 
point  operations.  The  corresponding  results  are  plotted  in 
Fig.  6. 


Fig.  4.a:  Adaptive  Gaussian  mixture  method 

The  achieved  results  show  clearly  that  our  algorithm 
outperforms  either  the  histogram  with  variable  cells  or  the 
Gaussian  mixture  methods.  In  fact,  the  estimated  pdf’s  is 
a  rather  good  approximation  of  the  actual  pdf,  being 
better,  or  at  least  comparable,  than  those  obtained  with 
the  other  two  methods.  Moreover,  the  choice  of  the 
normal  pdf  p0(-)  in  (8)  represents  a  good  compromise 
between  accuracy  and  numerical  complexity  of  the 
proposed  estimation  procedure. 


Fig.  4.b:  QQ-plot  based  estimation  with  normal  pdf  pn 


Fig.  5:  Comparison  of  different  algorithms  in  terms  of 
integral  estimation  error  (Al:  Histogram  with  variable 
cells,  A2:  Adaptive  Gaussian  mixture,  A3:  QQ-plot  based 
estimator  with  uniform  p0 ,  A4:  QQ-plot  based  estimator 
with  normal  pn ) 


Fig.  6:  Comparison  of  different  algorithms  in  terms  of 
number  of  floating  point  operations  (Al:  Histogram  with 
variable  cells,  A2:  Adaptive  Gaussian  mixture,  A3:  QQ- 
plot  based  estimator  with  uniform  p0 ,  A4:  QQ-plot  based 
estimator  with  normal  p0 ) 
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5.  CONCLUSION 

Inspired  by  the  QQ-plot  technique,  we  developed  an 
algorithm  to  estimate  a  pdf  from  a  sequence  of  samples. 
The  highlights  of  the  presented  algorithm  are  the 
following:  1)  it  is  described  as  mixture  parametric- 
nonparametric  approach;  its  nonparametric  character 
implies  that  we  can  make  a  simple  assumption  about  the 
data,  i.e.,  concerning  the  function  p0(-)  under  which  the 
QQ-plot  is  constructed;  2)  the  final  pdf  estimate  result  is 
in  the  form  of  an  analytical  function  rather  than  a 
numerical  function;  3)  it  is  computationally  simpler  and 
performs  better,  or  comparably,  with  respect  to  similar 
parametric-nonparametric  methods,  like  the  adaptive 
Gaussian  mixture  algorithm;  4)  it  performs  better  than 
recursive  parametric  methods,  like  the  histogram  method 
with  variable  cells,  at  the  expense  of  a  modest  additional 
computational  effort;  and  5)  in  contrast  to  classical 
methods,  which  aim  at  getting  the  actual  pdf,  it  only 
approximates  this  pdf,  just  as  parametric  methods  do.  The 
drawback  of  the  proposed  approach  is  its  nonrecursivity, 
which  means  that  all  samples  have  to  be  stored  during  the 
computation  of  the  pdf  estimate.  Nevertheless,  it  presents 
a  good  compromise  between  estimation  accuracy  and 
numerical  complexity,  and  offers  a  good  alternative  to  the 
other  methods  known  from  the  literature.  Also,  there  are  a 
diversity  of  possible  applications.  Whenever  it  is 
necessary  to  estimate  probability  density  functions  of  the 


signals,  patterns  of  clutter,  as  it  is  the  case  in  digital 
communications,  pattern  recognition  or  radar/sonar 
systems,  our  method  can  be  used.  Also,  the  estimation 
algorithm  can  be  implemented  relying  on  sliding  window 
techniques,  yielding  therefore  the  possibility  of  operation 
on  slowly  time  varying  scenarios.  In  these  situations,  the 
length  of  the  sliding  window  has  to  be  a  compromise 
between  the  desired  estimation  accuracy  and  the  changing 
rate  of  the  statistics/distributions  of  interest. 
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ABSTRACT 

In  this  paper,  a  method  to  design  random  variables  (rv) 
generators  with  the  same  probability  density  function  (pdO  as  a 
given  rv  record  is  presented.  The  resulting  rv  generator  is  a 
nonlinear  system  that,  when  driven  by  a  uniformly  distributed  rv, 
provides  an  output  rv  with  the  desired  pdf  distribution.  The 
analytical  description  of  the  desired  pdf  is  not  needed;  in  fact, 
only  a  data  record  of  the  desired  rv  is  used  Inversion  of 
nonlinear  systems  and  nonlinear  system  adaptive  design  are  used 
in  this  work. 


1.  INTRODUCTION 

For  simulation  purposes,  generators  of  random  variables  (rv) 
with  a  given  probability  distribution  function  (pdf)  are  often 
needed.  For  instance,  in  the  model  for  a  time-space  radio  channel 
of  [1],  it  is  shown  that  the  wave  azimuth  distribution  almost 
matches  a  Gaussian  pdf,  whereas  their  delay  distribution 
approximately  fits  an  exponential  pdf.  Nevertheless,  the 
measured  pdf  could  not  always  properly  fit  an  analytic  pdf 
distribution  with  an  acceptable  confidence  level  and  over  the 
entire  range.  For  instance,  in  this  radio  channel  model,  the  tails 
of  the  measured  azimuth  distribution  are  not  well  fit  by  a  normal 
pdf. 

In  this  paper,  a  method  to  design  rv  generators  with  the  same  pdf 
distribution  as  a  given  rv  record  is  presented.  As  shown  below, 
the  resulting  rv  generator  is  a  nonlinear  system  (NLS)  that,  when 
driven  by  a  uniformly  distributed  rv,  provides  an  output  rv  with 
the  desired  pdf  distribution.  It  is  important  to  point  out  that  the 
analytical  description  of  the  desired  pdf  is  not  needed;  in  fact, 
only  a  data  record  of  the  desired  rv  is  used.  As  will  be  shown, 
NLS  inversion  and  NLS  adaptive  design  are  involved  in  the 
design. 

The  paper  is  organized  as  follows.  In  section  II,  the  “whitening” 
of  a  rv  is  presented.  That  is,  we  describe  a  procedure  to  obtain  a 
uniformly  distributed  rv  from  a  given  rv  record  with  another  pdf. 
In  section  III,  the  rv  generator  design  problem  leads  to  an  NLS 
inversion  problem,  which  is  solved  adaptatively.  Section  IV 
presents  simulations  and,  the  paper  ends  with  conclusions  in 
section  V. 


2.  PDF  WHITENING 

In  [2],  a  parallelism  between  the  role  that  a  pdf  function  plays  in 
nonlinear  signal  processing  and  the  role  that  the  power  spectrum 
density  function  plays  in  linear  signal  processing  is  presented. 


This  relationship  allows  us  to  solve  the  problem  of  pdf  whitening 
(that  is,  to  obtain  a  uniformly  distributed  rv  from  another  rv) 
similarly  to  whitening  the  power  spectral  density  of  a  stochastic 
process. 

The  pdf  whitening  problem  involves  the  design  of  an  NLS 
system,  denoted  by  g[.],  that  provides  a  uniformly  distributed  rv 
output,  denoted  hereafter  by  «(»)  with  n  discrete  time  index, 
whenever  it  is  driven  by  a  data  record  x(n)  of  a  given 

distribution.  Thus,  we  have, 

u(n)  =  g[x(n)].  (1) 

It  is  well-known  [3]  that  such  a  system  is, 

u(n)  =  g[x(n)]=2U0[Fx(x(n))~  y^\  (2) 

with  Fx(x)  the  input  distribution  function  and  U0  the  output 
range,  i.e.  ue  [-Uo.Uo]-  As  (2)  is  monotonically  increasing,  the 
relation  between  the  input  pdf,  pAx),  and  the  output  one,  piiu), 
is 

Pv<u>.px<x,/^<%^  (3) 

and  it  can  be  stated  in  the  following  integral  form: 
u  x 

jPlj(tt)da=  \pxCk)dk.  (4) 

— oo  — oo 


Assuming  that  the  input  range  is  finite1,  i.e.  xe  [-Xq.Xq],  and 
stating  the  input  pdf  function  pxix)  in  terms  of  the  Fourier  series 
approach,  expression  (4)  leads  to 


u  +  U0 
2 1/0 


1 

24T0 


+oo 

I  V^l  jk 


n 

*0 


x 

J  e 

-*o 


-jk—x 
*0 


dk 


(5) 


with  yodjA  the  characteristic  function  of  the  rv  x.  Due  to  the 
Fourier  series  periodicity,  (5)  is  valid  only  for  u  and  x  values 
within  their  respective  ranges.  It  is  straightforward  to  see  that  (5) 
leads  to  the  following  pdf  whitening  system 


1  If  it  were  not  finite,  a  truncation  of  the  input  range  would  be 
assumed  with  a  certain  overflow  probability 
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3.  RV  GENERATOR 
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For  practical  purposes,  the  infinite  summation  in  (6)  is  truncated 
to  |  k  |  <K  and  the  characteristic  function  can  be  estimated  by  the 
sample  estimator.  Assuming  N  samples  of  x,  the  characteristic 
function  estimate  could  be, 


ir  1  N  jk-^-x(n) 

Vx(Jk—  )  =  —  0 


*0 


N  „=1 


leading  to  the  approximate  pdf  whitening  system,  £[•]. 

-jk-$-x(n) 

*0 


&(»)}  = 


+K 

x+k}_KVx\Jk± 

k*0 


Xq  -(.if 


(7) 


(8) 


Important  to  remark  is  that,  unlike  a  simpler  whitening  system 
consisting,  for  instance,  of  the  direct  estimation  of  (2),  this  pdf 
whitening  system  allows  a  recursive  computation  of 
x(  jkn/ Xq)  and  enables  the  system  to  whitening  non¬ 
stationary  rv.  Additionally,  although  outside  the  scope  of  this 
paper,  it  is  worth  to  point  out  that  the  previous  pdf  whitening 
system  can  be  generalized  to  an  arbitrary  number  of  rv’s  (see  [2] 
for  details). 

In  order  to  show  the  performance  of  the  presented  pdf  whitening 
system,  2000  samples  of  a  normally  distributed  rv  *:N(0,1)  are 
considered.  Let  us  assume  that  \x  I  <A()=3,  i.e.  an  overflow 
probability  of  10'3  is  allowed.  From  the  estimated  values  of  the 
characteristic  function  y y( Xq)  for  I  I  <AT=  10  (7)  and 
considering  Uy= 5,  the  pdf  whitening  system  (8)  is  obtained. 
Figure  1  shows  the  normalized  histogram  of  an  x  data  record  of 
8000  samples  and  the  resulting  “whitened”  u  samples  obtained 
from  (8).  As  seen,  the  output  rv  histogram  has  a  flat  shape. 


INPUT  PDF  ESTIMATE  OF  X 


OUTPUT  PDF  ESTIMATE  OF  U 


Figure  1.  Histograms  (8000  samples)  of*  (a)  and  u  (b). 


From  here  on,  we  focus  on  the  problem  of  designing  a  NLS 
system  whose  output  has  a  given  pdf  function  when  it  is  driven 
by  a  uniformly  distributed  rv.  Hereafter,  such  a  system  will  be 
referred  to  as  an  rv  generator. 

In  the  previous  section,  we  showed  that  a  data  record  of  a  rv  x 
with  a  given  pdf  provides  us  with  a  NLS  system  able  to  generate 
a  uniformly  distributed  rv  when  that  NLS  system  is  driven  by  x. 
Consequently,  as  shown  in  Figure  2,  the  design  of  a  x  rv 
generator  system  becomes  a  nonlinear  inverse  system  design 
problem  of  the  pdf  whitening  NLS  in  (8). 


u(n) 


•  u(n) 


RV  GENERATOR 


PDF  WHITENING 


Figure  2.  NLS  inversion  to  design  the  rv  generator. 


According  to  (2),  the  ideal  rv  generator  function  J[ii\  is 


f[u\  =  g  %]  =  FX  ■ 
\ 


u  +  Uq 
2  U0 


(9) 


For  the  sake  of  comparison,  two  different  NLS  designs  are 
considered  to  model  the  rv  generator:  a  Volterra  model  (10) 
denoted  by  fp{u)  and  a  trigonometric  or  Fourier  model  (11) 
denoted  by  ff(u)  [2]. 


fv{u(n))=  '£ay(q)-uq(n)s  av  ■ zy(n )  (10) 

3=0 

fF{u(n))=  o(0)+  llapdq)  •  cos(lq  a)f)  u(n))+ 

0=1 

+  ay(2q  +  \)-sin((2q-\)i%u(n))[=ap  ■ Z/rfw )  (11) 

The  linearity  of  both  models  with  respect  to  the 
coefficients  enables  a  vector  notation  as  seen  in  (10)  and 
(11).  The  vectors  of  the  nonlinear  models  are  av,  the 
Volterra  coefficient  vector,  Zi(n)  the  Volterra  functional 
vector  consisting  of  the  powers  of  w(n),  aF  the  Fourier 
coefficient  vector  and  Zi.(n)  the  Fourier  functional  vector 
consisting  of  the  sine  or  cosine  functions  of  u(n).  Also  in 
(11)  the  so-called  principal  frequency  is  defined  as 
a%=7i/{2Xo).  (See  [2]  for  detail  about  the  Fourier  model). 

As  shown  in  Figure  3,  the  design  of  the  rv  generator  can  be 
accomplished  in  an  adaptive  manner  by  means  of  the  so-called 
Predistortion-LMS  (PLMS)  [4], 
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I*]  <  K  =  10 
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Figure  3.  Adaptive  design  of  the  rv  generator. 

The  PLMS  update  of  the  Volterra  or  Fourier  coefficients  follows 
a(n  +  l)  =  a(n)  +  -^-j-u(n)-Vxg(xj^xf„yz(n),  (12) 

substituting  a(n)  for  the  respective  coefficient  vector  and  z(n)  for 
the  respective  functional  vector,  as  defined  in  (10)  for  the 
Volterra  model  and  in  (1 1)  for  the  Fourier  model.  In  (12)  p  is  the 
step-size  parameter,  u(n)  is  the  error  signal, 

u(n)  =  u(n)-u(n)  (13) 

and  p{n)  is  the  estimate  of  the  power  of  the  functionals, 

p(«  +  l)  =  p-  p(n)  +  (\-$)z(n)z(n).  (14) 

The  PLMS  adaptive  algorithm  is  a  gradient  algorithm  useful  in 
NLS  inversion  problems  because  it  includes,  due  to  the  chain 
rule,  the  gradient  VTg(jr)  of  the  function  to  be  inverted.  From  (2), 
the  gradient  depends  on  px(x). 

=  2 U0  ■  Vxg[Fx(x)}  =  2Uq  ■  Px(x)  (15) 

For  practical  purposes,  the  gradient  of  (8)  can  be  used  directly. 
Different  pdf  estimates  could  be  also  taken  into  account  to 
estimate  de  gradient  [2]. 

The  design  of  the  rv  generator  could  have  also  been  performed  in 
reverse  order,  that  is,  g(- )  could  have  been  put  in  front  of  the 

f(  )  in  Figure  3.  In  that  case,  a  least  square  solution  of  the  NLS 
model  of  the  rv  generator  would  be  feasible  because  the  signal 
error  is  linear  with  the  coefficients.  The  limitation  is  that,  in  the 
reverse  order,  a  large  record  of  x  would  be  needed,  and  the 
objective  of  the  paper  is  precisely  to  design  a  generator  of  the  x 
rv  from  a  small  record  of  x. 

4.  SIMULATIONS 

Two  sets  of  simulations  are  included.  The  first  one  uses  the 
actual  characteristic  function,  whereas  in  the  second  simulation 
only  a  record  of  a  x  rv  is  assumed. 

First,  let  us  consider  a  Laplacian  rv  whose  pdf  is, 

px(x)  =  a/  2e_aH  (16) 

with  parameter  or  set  to  1.  The  pdf  whitening  system  is  built  from 
expression  (8)  with  Uo=  1  and  using  the  actual  samples  of  the 
characteristic  function  for  A=10. 


Vx(jk-iA)  = 


a 


X0  a2  +  *JL 


Due  to  symmetry  of  the  distribution  function,  the  pdf  whitening 
system  and  the  inverse  system  both  have  odd  input/output 
relations.  Thus,  the  Volterra  system  (10)  that  models  the  rv 
generator  only  keeps  the  odd  powers,  whereas  the  Fourier  model 
keeps  only  the  sine  functionals.  Both  models  consist  of  15 
coefficients. 


A  5000-length  data  record  of  a  uniformly  distributed  rv  is  used  in 
the  adaptive  design  of  the  rv  generator  (3).  The  PLMS 
parameters  (12)  (14)  are  set  to  p  =  2  and  [i  =  0.99  for  both 
models.  The  gradient  function  (15)  is  computed  using  the 

Laplacian  pdf  function  (16).  The  final  relations  of  fy(u)  and 

fp( u )  together  with  the  ideal  ones  (dashed  line)  are  shown  in 
figure  (4.a)  and  (4.b). 


VOLTERRA  RV  GENERATOR  FUNCTION 


Figure  4.  Ideal  rv  generator  in  dashed  line.  In  solid  line, 
Volterra  (a)  and  Fourier  (b)  rv  generator  functions. 


Figure  (5)  compares  the  Laplacian  pdf  (dashed  line)  to  the  output 
rv  histogram  of  the  Volterra  rv  generator  (Fig.  5. a)  and  Fourier  rv 
generator  (Fig.  5.b).  Both  histograms  have  been  computed  from 
2- 104  length  data  records. 

Although  not  shown,  the  convergence  of  the  Fourier  coefficients 
is  faster  than  that  of  the  Volterra  coefficients,  but  the  Fourier 
model  does  not  properly  fit  the  tails  of  the  Laplacian  pdf  function 
(Fig.  5.b).  This  is  due  to  the  fact  that  the  ideal  function  J[u]  has  a 
sharp  behavior  at  the  boundaries  of  the  input  range  (see  Fig.  4) 
that  the  Fourier  model  does  not  match  properly.  In  this  case,  the 
Volterra  model  provides  better  performance  for  such  a  NLS 
design. 
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IDEAL  PDF  AND  OUTPUT  RV  HISTOGRAM  OF  THE  VOLTERRA  MODEL 


Figure  5.  Laplacian  pdf  function  (dashed  line).  In  the 
solid  line,  the  output  histogram  of  the  Volterra  (a)  and 
Fourier  (b)  rv  generator  systems. 


The  second  set  of  simulations  uses  only  2000  samples  of  a 
normal  distributed  rv  jc:N(0,1).  As  shown  in  section  2,  this  data 
record  allows  the  design  of  the  pdf  whitening  system  (in  this  case 
U(f=  1).  Once  the  pdf  whitening  system  is  obtained,  the  rv 
generator  can  be  adaptatively  designed  using  the  scheme  of 
Figure  3. 

For  that  purpose,  a  Volterra  system  (10),  fy( u)  ,  with  2=15  is 

considered  to  model  the  rv  generator.  The  coefficients  are 
updated  with  2-104  samples  of  a  uniformly  distributed  rv  u  and 
by  means  of  the  PLMS  adaptive  algorithm  with  it=  2  and  /Ml.99. 
The  Fourier  series  approximation  of  p^x)  (AT=  1 0)  is  used  to 
compute  gradient  function,  V^gfx). 


IDEAL  PDF  AND  OUTPUT  RV  HISTOGRAM  OF  THE  VOLTERRA  MODEL 


Figure  6.  (a)  Ideal  pdf  (dashed  line)  and  histogram  of 
the  rv  generator  output  (solid  line),  (b)  Ideal  rv  generator 
function  (dashed  line)  and  actual  rv  generator  function 
(solid  line). 


Figure  (6.b)  shows  the  ideal  input/output  relation  of  the  rv 
generator  system  in  the  dashed  line  along  with  the  final  one 
achieved  by  the  Volterra  system  after  the  adaptive  design. 
Additionally,  figure  (6.a)  shows  the  actual  pdf  (dashed  line)  and 
the  histogram  of  the  Volterra  rv  generator  output  using  2-104 
samples. 

5.  REMARKS 

This  paper  shows  how  a  nonlinear  system  that  generates  a  rv  with 
a  given  pdf  can  be  designed  from  knowledge  only  of  a  data 
record  of  such  a  rv.  It  has  been  shown  that  data  records  of  2000 
samples  are  large  enough  to  obtain  a  reliable  rv  generator  system. 
As  a  preliminary  step,  we  also  presented  the  design  of  nonlinear 
systems  that  are  able  to  provide  a  uniformly  distributed  rv  at  the 
output  when  driven  by  an  input  signal  with  a  given  pdf. 
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ABSTRACT 

Establishing  measures  for  local  stationarity  is  an  open 
problem  in  the  field  of  time-frequency  analysis.  One 
promising  theoretical  measure,  known  as  the  spread, 
provides  a  means  for  quantifying  potential  correlation 
between  signal  elements.  In  previous  papers  we  investi¬ 
gated  the  issue  of  implementing  such  a  measure  for  dis¬ 
crete  signals.  The  numerical  spread  was  introduced  [1] 
as  a  means  of  applying  and  investigating  the  techniques 
previously  only  studied  theoretically. 

When  implementing  such  a  scheme  it  became  nec¬ 
essary  to  augment  the  covariance  matrix  so  that  the 
resulting  ambiguity  space  has  a  uniform  resolution.  In 
this  paper  we  compare  three  extension  schemes:  zero 
padding,  circular  extension,  and  edge  replication,  to 
determine  which  provides  the  best  estimate  of  the  nu¬ 
merical  spread.  Based  on  our  results,  we  determined 
that  the  method  of  normalized  edge  replication  is  least 
likely  to  inflate  the  estimate  of  the  spread. 

1.  INTRODUCTION 

One  assumption  of  most  signal  processing  techniques 
is  that  the  signal  is  stationary  i.e.,  the  statistics  do 
not  change  over  time.  Many  real  data  sets  are  not 
stationary  but  can,  however,  be  described  as  locally 
stationary,  that  is  they  appear  stationary  over  finite 
time  intervals.  Since  we  will  not  assume  access  to  all 
orders  of  statistics  we  will  constrain  our  discussion  to 
the  second  order  statistics  of  a  signal. 

Establishing  measures  for  local  stationarity  is  an 
open  problem  in  the  field  of  time-frequency  analysis. 
Some  desirable  properties  for  such  a  measure  [2]: 

•  Quantitative  •  Measurable 

•  Robust  •  Analytically  Powerful 

One  promising  theoretical  measure,  known  as  the 
spread,  was  introduced  by  W.  Kozek  [3],  and  provides 


a  means  for  quantifying  potential  correlation  between 
signal  elements. 

When  implementing  such  a  scheme  numerically,  it 
will  be  shown  that  the  natural  way  in  which  the  data  is 
defined  leads  to  calculation  of  FFTs  of  various  lengths. 
This  is  undesirible  since  the  resulting  spectra  will  be 
calculated  at  different  resolutions.  This  work  investi¬ 
gates  methods  that  address  the  problem  of  augmenting 
the  covariance  matrix  such  that  calculations  are  made 
on  a  grid  of  constant  frequency  without  introducing 
other  artifacts. 

1.1.  Local  Stationarity 

Before  reviewing  the  theory  of  spread,  it  will  be  useful 
to  explicitly  define  local  stationarity.  A  signal  is  locally 
stationary  if  the  following  conditions  hold: 

1.  The  autocovariance  has  limited  variation  over  some 
time  interval  T. 

2.  For  lags,  r,  that  extend  beyond  the  interval  T  the 
autocovariance  needs  to  be  small,  preferably  zero, 
to  prevent  time  dependence  outside  the  interval. 

Rx{t,  t  +  r)  0,  for  t  e  T  and  t  +  r  &  T. 

1.2.  Theoretical  Spread 

To  establish  a  context  for  numerical  spread  we  first 
summarize  the  theoretical  framework  for  spread  of  a 
random  process  x(t)  defined  for  continuous  t.  To  quan¬ 
tify  the  degree  of  local  stationarity  we  introduce  the 
(generalized)  ambiguity  function  and  the  expected  (gen¬ 
eralized)  ambiguity  function. 

The  ( generalized)  Ambiguity  Function  (AF)  [4]  of  a 
deterministic  signal  x(t)  is  given  by 

4a)(r,i/)  =  J  x{t  +  (§  —  a)r)x*(t  —  (§  4-  a)T)e~l2nvtdt. 
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then  the  numerical  spread  is  given  by 


Given  a  nonstationary  random  process  x(t)  the  expected 
(generalized)  ambiguity  function  EAF  EA^\t,v)  is 
the  expectation  of  the  AF 

EAia\ t,u) 

=E  [jx(t  +  (i-  a)r)a;*(f  -  (|  +  a)T)e~l2nvtdt 
=  I  Rx(t  +  ( |  -  a)r,  t-{\+  a)T)e~'2*vtdt. 

Let  Ria\t,T)  =  Rx(t  +  (|  -  a)r,t-  (|  +  a)r),  then 

EA^  (t,  v)  =  J  (t,  T)e-'2*vtdt.  (1) 

If  the  EAF  is  zero  about  a  given  “TF  lag  point” 
(n2,^i2),  then  any  two  TF  points  (fi,/i)  and  (ti,/a) 
with  ti—h  =  7i2  and  /i  -  /2  =  ^12  are  uncorrelated. 
The  EAF  indicates  the  potential  correlation  between 
Time-Frequency  points  separated  by  the  time  lag  r  and 
the  frequency  lag  v  [3]. 

Let  [— crT,cxT]  x  [— av,crv]  be  the  smallest  rectangle 
(centered  at  the  origin  of  the  (r,  v )  plane)  which  con¬ 
tains  the  effective  support  of  the  EAF,  i.e., 

\EA^(t,v)\  m  0  for  |r|  >  o>  or  \v\  >  av. 

Define  the  spread  of  x  as  the  area  of  this  rectangle: 

(J x  —  -  4 (7r  (7„ .  (2) 

From  these  definitions  we  can  view  the  spread  as  a 
product  of  temporal  correlation,  oy,  and  spectral  cor¬ 
relation  (7„ . 

It  was  shown  that  the  EAFs  obtained  for  different 
choices  of  a  are  equal  up  to  a  phase  factor  [3], 

EA^\r,  v)  =  et2^ai~a^TVEAiai\T,  v)  (3) 

\EA^\t,u)\  =  \EA^(t,u)\.  (4) 

Thus,  with  respect  to  the  calculation  of  spread,  the 
factor  a  is  arbitrary. 

1.3.  Numerical  Spread 

To  define  numerical  spread  we  let  T  represent  the  sam¬ 
pling  rate  and  denote 

rx  [n,  m]  =  Rx  ( nT ,  mT) .  (5) 

We  can  then  write  equation  (1)  in  terms  of  discrete 
variables.  Let 

[n,  m]  =  rx [n  +  (a  -  |)m,  n  -  (a  +  \)m ]  (6) 


n-  1 

NEA^[m,k]  =  ]T  ria)[n,m\e-l2™k/N. 

n=0 

It  was  shown  in  Equation  (4)  that  the  choice  of  a 
was  arbitrary  with  respect  to  the  calculation  of  spread. 
For  discrete  data  we  must  choose  a  =  k/ 2  for  some  inte¬ 
ger  k  since  this  corresponds  to  integer  shifts  between  n 
and  m  in  equation  (6).  For  our  work,  we  chose  a  =  1/2 
since  it  fits  the  above  criterion  and  has  the  added  bene¬ 
fit  that  the  discrete  version  of  the  EAF,  denoted  NEA, 
becomes  the  discrete  Fourier  transform  along  the  diag¬ 
onals  of  the  covariance  matrix.  We  can  now  define  the 
NEAF  as, 

n-  1 

NEAx[m,  fc]  =  Y,  ri1/2) [n>  m]e~l2nnk^N  (7) 

n=0 
TV— 1 

=  ^fI[n,n-m]e-,2TO^,  (8) 

n=0 

where  the  fx  is  the  extended  autocovariance  function 
which  is  the  focus  of  this  research  and  will  be  developed 
in  the  next  section. 

To  calculate  the  spread  we  must  determine  the  ef¬ 
fective  region  of  support  of  the  NEA.  To  facilitate  this 
calculation  we  project  the  signal  onto  the  r  and  v  axes 
and  determine  the  support  of  these  projections.  Define 
the  two  projections  as: 

Wr[fc]  =  £Ar£AJm,fc], 

m 

N„[m]  =  Y,NEAx[m,k}. 

k 

Using  these  projections  we  can  calculate  the  spread  in 
the  r  and  v  directions. 

aT  =  \MT\  :  {|iV„[m]|  <  6,  for  \m\  >  |M|}  ,  (9) 

v*  =  |]$f|  :{|ATr[fc]|  <<5,  for  |fc|  >  |iF|} ,  (10) 

where  <5  is  a  pre-determined  threshold. 

The  numerical  spread  is  defined  as  the  effective 
support  of  the  NEAF  [1] 

<7  =  4  aTav. 

2.  THE  EXTENDED  AUTOCOVARIANCE 

The  extended  autocovariance  in  Equation  (8)  arises 
from  the  finite  nature  of  the  data.  If  we  assume  that 
the  data  is  a  square  covariance  matrix  it  follows  that 
the  number  of  elements  in  a  diagonal  entry  is  a  func¬ 
tion  of  r;  if  R  e  RNxN  then  the  lengths  of  the  diagonal 
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vectors  will  range  in  length  from  1  to  N.  In  order  to 
maintain  a  constant  frequency  resolution  in  the  r  xv 
plane,  the  diagonals  must  be  padded  with  values  such 
that  each  vector  has  length  N. 

The  parameter  av  can  be  viewed  as  the  “effective 
bandwidth”  of  the  covariance.  The  goal  of  the  exten¬ 
sion  is  to  provide  constant  resolution  with  respect  to 
v  in  the  r  x  v  plane.  However,  by  extending  the  data 
to  achieve  constant  resolution,  we  run  the  risk  of  intro¬ 
ducing  artifacts  into  the  NEAF  that  may  ultimately 
effect  the  numerical  spread. 

Two  issues  must  be  addressed: 

1.  an  increase  in  the  energy  of  the  diagonal  entry 
could  inflate  the  value  of  aT. 

2.  sharp  discontinuities  will  add  high  frequency  terms 
to  the  NEAF  possibly  inflating  the  value  of  cr„ . 

Issue  1  can  easily  be  addressed  by  scaling  the  ex¬ 
tended  covariance  data  so  the  energy  along  each  diag¬ 
onal  is  preserved.  To  address  issue  2  three  extension 
methods  were  examined.  The  extension  methods  cho¬ 
sen  were  based  in  part,  on  schemes  examined  by  Karls- 
son  and  Vetterli  [5]  in  the  context  of  multirate  filter 
banks. 

We  extended  the  computational  matrix  by  increas¬ 
ing  the  range  of  m  with  -(AT  -  1)  <  m  <  2(AT  -  1) 
and  implemented  the  following  schemes:  zero  padding, 
circular  extension,  and  replication  of  edge  values.  Con¬ 
ceptually  these  are  all  straightforward  methods  of  data 
manipulation.  The  method  of  zero  padding  simply 
pads  the  vector  with  zeros  so  that  the  resulting  vec¬ 
tor  has  length  N.  The  method  of  circular  extension 
creates  the  length  N  vector  by  treating  the  defined 
data  as  one  period  of  a  periodic  sequence  and  repli¬ 
cating  that  sequence  throughout  the  vector.  The  last 
method,  replication  of  edge  values,  simply  repeats  the 
last  defined  value  to  form  a  vector  of  length  N. 

To  be  precise,  we  present  these  methods  formally. 
To  help  simplify  the  presentation  we  introduce  the  aux¬ 
iliary  variable  p  =  n  —  m. 


where 


m  = 


and 


n  =  < 


'[(m  —  |p|)  mod  (N  • 
m  mod  (N  -  |p|), 


m 


IpI)]  +  |p|>  P  <  0, 
P>  0, 
P=  0, 
(12) 


fn  mod  (N  -  \p\),  p  <  0, 

[(n- |p|)  mod  (at- |p|)]  +  |P|,  p>o, 
n  p  —  0. 

(13) 


3.  Replication  of  edge  values:  recall  that  m  is  de¬ 
fined  such  that  —(AT  —  1)  <  m  <  2(AT  —  1),  then 


^n,m  —  \ 


r*[|p|,0]  m<0, 

rx[n,m]  0<m<N  —  1, 

[rx[AT-l-|p|,Al-l]  m  >  AT  —  1. 

(14) 


It  should  be  noted  that  the  other  methods  men¬ 
tioned  by  Karlsson  et  al.,  symmetric  extension  and 
doubly  symmetric  extension,  are  not  well  defined  when 
applied  to  a  data  set  with  a  defined  length  significantly 
shorter  than  the  desired  data  length  and  so  were  not 
implemented  here. 


3.  EXPERIMENTAL  RESULTS 

To  compare  the  three  extension  schemes  developed  above 
they  were  implemented  in  Matlab  and  applied  to  mul¬ 
tiple  data  sets.  The  results  presented  here  arose  from 
applying  these  methods  to  a  covariance  that  slowly 
decayed  away  from  the  main  diagonal  and  varied  in 
time  as  shown  in  Figure  1.  The  size  of  the  matrix  was 
256  x  256,  the  threshold  used  to  determine  the  spread 
was  6  —  0.001. 


1.  Zero  padding: 

rx[n,m]  0  <  n,m  <  N  -  l, 

0  elsewhere. 

(11) 

2.  Circular  extension: 

rx[n,m]  =  rx[h,rh], 


3.1.  Zero  Padding 

The  method  of  zero  padding  is  easy  to  implement  and 
automatically  preserves  energy  but  introduces  jump 
discontinuities  for  any  diagonal  with  a  non-zero  final 
entry.  Jump  discontinuities  lead  to  high  frequencies 
terms  in  the  FFT  and  tend  to  inflate  the  value  of  cr„. 
The  resulting  autocovariance  and  contour  plot  of  the 
NEA  are  shown  in  Figure  2.  The  resulting  numerical 
spread  was  a  =  940.637. 
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Mesh  Plot  of  the  Autocovariance  Matrix 


Figure  1:  The  covariance  used  as  the  test  data  shows 
variation  in  time  and  lag. 


3.2.  Circular  Extension 

The  method  of  circular  extension  was  applied  to  the 
same  covariance  data.  In  order  to  prevent  inflation  of 
aT  each  diagonal  is  normalized  to  preserve  energy.  Cir¬ 
cular  extension  introduces  jump  discontinuities  when¬ 
ever  the  first  and  last  data  point  on  a  diagonal  differ, 
thus  possibly  inflating  the  value  of  av.  The  resulting 
numerical  spread  was  a  =  564.3822. 

3.3.  Replication  of  Edge  Values 

In  Figure  4  we  can  see  the  resulting  autocovariance 
and  NEA  when  the  method  of  edge  value  replication  is 
used.  This  method  produced  the  sharpest  estimate  of 
the  support  of  the  NEA  for  this  data  set.  By  replicating 
the  edge  values  jump  discontinuities  are  not  induced. 
To  prevent  inflation  of  oT  each  diagonal  was  normal¬ 
ized  as  in  the  case  of  circular  extension.  The  resulting 
numerical  spread  was  a  =  514.2149. 


Method 

0T 

G 

Zero  Padding 

2.2457 

214.5 

1926.9 

Circular  Extension 

2.2212 

185.0 

1648.1 

Edge  Replication 

1.6812 

176.0 

1183.6 

Table  1:  Comparison  of  spread  estimates  by  extension 
method  with  6  =  0.001. 


4.  SUMMARY  AND  CONCLUSIONS 

In  this  paper  we  have  refined  the  technique  of  comput¬ 
ing  the  numerical  spread  as  introduced  in  earlier  work. 


We  implemented  three  different  schemes  to  augment 
the  covariance  matrix  in  order  to  calculate  the  Numer¬ 
ical  Expected  Ambiguity  function  on  a  grid  of  constant 
resolution.  Of  the  three  methods,  zero  padding,  circu¬ 
lar  extension,  and  replication  of  edge  values,  it  was  de¬ 
termined  that  replication  of  edge  values  was  least  likely 
to  inflate  estimations  of  the  frequency  spread  <t„. 

One  theoretical  justification  for  the  improved  per¬ 
formance  of  replication  of  edge  values  is  that  if  x  is 
stationary  the  diagonals  of  the  correlation  matrix  will 
be  constant  i.e.,  C  is  Toeplitz.  By  extending  the  data 
with  constant  values  along  the  diagonals  were  are  not 
artificially  increasing  the  non-stationarity  of  the  data 
which  could  increase  the  spread. 

In  future  work  we  will  examine  the  issue  of  ro¬ 
bustness  of  numerical  spread  and  suggest  modifications 
for  improved  robustness.  Thus  providing  both  an  en¬ 
hanced  framework  for  our  future  work  in  the  area  of 
adaptive  signal  representations,  including  the  adapta¬ 
tion  of  novel  multirate  and  wavelet  signal  processing 
techniques  [6,  7]. 
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Figure  2:  Edge  Replication:  mesh  plot  of  Ri1/,2^[n,  m],  a  contour  of  the  associated  NEA,  the  projections  Nr  and 
Nu  with  threshold  <5  =  0.001  which  yield  aT  —  214.5.0  and  av  =  2.2457  thus  a  =  1926.9. 


2 

Figure  3:  Edge  Replication:  mesh  plot  of  Ri1/,2^[n,m],  a  contour  of  the  associated  NEA,  the  projections  NT  and 
N„  with  threshold  6  =  0.001  which  yield  aT  =  185.0  and  av  =  2.2212  thus  a  =  1648.1. 


Figure  4:  Edge  Replication:  mesh  plot  of  R *1,/2^[n,m],  a  contour  of  the  associated  NEA,  the  projections  NT  apd 
1V„  with  threshold  5  =  0.001  which  yield  aT  =  176.0  and  av  =  1.6812  thus  a  =  1183.6. 
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ABSTRACT 

The  signals  that  arise  in  Air  Force  applications  typi¬ 
cally  has  noise  that  can  be  modeled  as  a  non-stationary 
stochastic  process.  But,  there  may  be  intervals  of  time 
where  the  noise  behaves  more  like  a  stationary  process. 
This  motivates  the  study  of  locally  stationary  stochas¬ 
tic  processes.  We  rigorously  define  locally  stationary 
stochastic  processes  and  present  their  properties  and 
relationships  to  stationary  processes. 

Keywords:  stochastic  process,  stationary,  piecewise 
stationary,  locally  stationary 

1.  INTRODUCTION 

Properties  of  stationary  processes  are  well  known  and 
have  been  used  extensively  in  analyzing  system  perfor¬ 
mance  and  finding  optimal  controls  for  stochastic  sys¬ 
tems.  Recall  that  a  stationary  process  is  one  where  all 
the  finite-dimensional  joint  distributions  are  invariant 
to  shifts  in  time  (time  homogeneous).  Non-stationary 
processes  appear  in  many  engineering  applications  where 
random  fluctuations  change  in  time  or  space.  If  the  pro¬ 
cess  is  slowly  varying  and  if  the  interval  is  short  enough, 
then  the  process  can  be  approximated  (in  some  sense) 
by  a  stationary  one. 

Recently,  researchers  (e.g.,  Mallat  et.  al.[5],  Donoho 
[2]  and  Dahlhaus[l])  have  turned  their  attention  to  so- 
called  locally  stationary  processes  as  a  tool  to  model 
systems  where  the  behavior  varies  as  a  function  of  time. 
Unfortunately,  to  date  there  has  not  been  an  univer¬ 
sality  satisfying  definition  of  what  is  meant  by  “locally 
stationary.”  This  paper  proposes  such  a  definition,  and 
illustrates  the  definition  with  some  properties. 

An  early  paper  by  Silverman  [6]  uses  the  term  lo¬ 
cally  stationary  to  refer  to  a  process  whose  covariance 
is  a  product  of  a  (normalized)  stationary  covariance 
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multiplied  by  a  sliding  power  factor.  It  appears  that 
the  “local”  refers  to  a  point  property  (verses  an  in¬ 
terval  property).  The  efforts  by  Mallat  et.al.[5]  and 
Dahlhaus  [1]  have  achieved  some  limited  success  in  for¬ 
malizing  a  definition  of  locally  stationarity,  in  part, 
because  theirs  are  based  on  Fourier  analysis.  Mal¬ 
lat  uses  the  local  trigonometric  bases  which  originated 
with  Coifman  and  Meyer  [?]  and  has  been  generalized 
by  Suter  and  Oxley  [7].  On  the  other  hand,  Dahlhaus’ 
definition  of  locally  stationary  allows  a  much  broader 
class  of  processes,  including  processes  where  the  mean 
is  never  constant  for  any  given  time  period.  There  are 
other  references  in  the  literature  working  with  locally 
stationary  processes,  e.g.  [3],  but  they  do  not  define 
locally  stationary.  It  appears  that  their  working  defi¬ 
nition  is  similar  to  Mallat ’s. 

Donoho  et.  al.  [2]  use  a  particular  definition  of  lo¬ 
cally  stationary,  tailored  to  allow  them  to  study  certain 
phenomenon  of  time-inhomogeneity.  Their  definition 
allows  very  abrupt  changes  in  the  process  (allowing  a 
locally  stationary  window  to  be  as  short  as  one  sam¬ 
ple)  so  long  as  the  correlation  between  samples  decays 
sufficiently  fast  and  there  are  not  “too  many”  change 
points  in  a  given  interval.  In  the  end,  they  call  for 
the  development  of  a  new  definition  since  “there  is,  at 
the  moment,  no  definition  which  really  captures  all  the 
facets  of  local  stationarity.” 

2.  DEFINITIONS 

Let  (D,  B,  P)  be  a  probability  space,  so  that  0  is  the 
sample  set  of  outcomes,  B  is  the  rr-algebra  of  events, 
and  P  is  the  probability  measure.  Thus,  a  real- valued 
random  variable  X  is  a  function  mapping  an  outcome 
w  g  SI  to  some  real  number  r  £  I,  so  that  r  =  X  (w). 
Let  I  c  1  be  an  interval  (possibly  the  whole  real 
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line).  For  each  t  E  7,  let  X(t)  be  a  random  vari¬ 
able,  so  that  X(t,u)  is  a  real  number.  A  real-valued 
stochastic  process  X  =  {X(t)  :  t  E  7}  is  a  collection 
of  random  variables.  Let  5  C  7  be  a  finite  set,  i.e., 
S  =  {ti,t2,...  ,tn}  with  ti  ^  tj  for  i  ^  j  for  some 
n  E  N.  Denote  the  collection  of  all  possible  finite  sub¬ 
sets  of  7  by  T i .  For  a  real  number  r  we  use  the  notation 
S  +  r  =  {si  +  r,  s2  +  r, . . .  ,  sn  +  r}  to  denote  the  trans¬ 
lated  set.  Given  S  =  {^i,  £2,  -  -  -  ,tn }  €  Ti,  define  the 
ordered  set  of  random  variables  X(S)  —  (X(ti),X(t2), 
...  ,X(tn)).  Define  the  joint  cumulative  distribution 
function  of  these  random  variables  to  be 

Fx{S)(c)  =  Pr[X(fi)  <  ci ,X(t2)  <c2,...  ,X(tn)  <  c„] 

where  c  =  (ci,c2)...  ,cn)  6  M”.  Recall  the  definition 
of  a  stationary  stochastic  process. 

Definition  1  (Stationary  on  the  real  numbers). 

A  stochastic  process  {X(t)  :  f  £  1}  is  said  to  be  sta¬ 
tionary  on  the  real  numbers  if 

Fx(S)  =  Fx(S+r ) 
for  all  r  E  M,  for  all  S  E  Jr. 

This  definition  of  stationary  is  also  called  strictly 
stationary.  If  the  cardinality  of  S  is  two,  this  definition 
becomes  wide-sense  stationary.  Many  real-world  sys¬ 
tems  may  not  meet  the  strict  requirements  for  station- 
arity,  but,  if  we  consider  an  interval,  I,  then  it  would 
be  stationary  with  respect  to  shifts  within  that  inter¬ 
val.  This  leads  to  the  following  definition  of  stationary 
on  an  interval. 

Definition  2  (Stationary  on  an  interval).  Let  I  C  1 
be  an  interval.  A  stochastic  process  {X (t)  :  t  E  1}  is 
said  to  be  stationary  on  the  interval  I  if 

Fx{S)  -  FX(S+r ) 

for  all  r  E  S(S,I),  for  all  S  E  Jj,  where  6(S,I)  =  {r  E 
K  :  S  +  r  C  I). 

Notice  that  the  interval  I  could  be  the  real  numbers, 
in  such  case  this  definition  reduces  to  strictly  station¬ 
ary.  Thus,  this  definition  is  more  general. 

Recall  the  definition  of  a  partition  of  an  interval. 

Definition  3  (Partition).  Let  I  C  M  be  an  interval 
(possibly  R).  A  partition  of  I  is  a  countable  collection 
of  subintervals  {Ji,  J2,...}  where  Jk  C  /  is  an  interval 
for  each  k  El,  some  countable  index  set,  such  that 

1.  Ji  fl  Jk  =  0  ,  (the  empty  set)  for  all  i  ^  k  in  1. 


2-  U  Jk  =  I- 

kei 

Recall  that  countable  includes  finite,  thus  the  index 
set  I  may  be  the  finite  counting  set  I  =  {1, 2, ...,  K}  for 
some  positive  integer  K  and  the  partition  is  { Ji,  J2, ..., 
Jk}-  We  will  denote  a  partition  of  the  interval  I  by 
P,  and  we  let  fP/  represent  the  collection  of  all  possible 
partitions  of  the  interval  I .  Thus  P  —  {./i ,  J2, .///}  E 
iP/  is  a  specific  instantiation  of  the  collection. 

Definition  4  (Locally  stationary  on  an  interval).  Let 
I  C  R  be  an  interval  (possibly  ffij.  A  stochastic  pro¬ 
cess  {X(t)  :  t  E  1}  is  said  to  be  locally  stationary  on 
/  if  there  exists  some  partition  P  E  iP/  and  at  least 
one  subinterval  J  E  P  such  that  the  stochastic  process 
{A'"(t)  :  t  E  J}  is  stationary  on  J. 

Definition  5  (Piecewise  stationary  on  an  interval). 
Let  I  C  K  be  an  interval  (possibly  Kj.  A  stochastic 
process  {X(f)  :  t  E  1}  is  said  to  be  piecewise  stationary 
on  I  if  there  exists  some  partition  P  E  iP/  such  that 
on  all  subintervals  J  E  P  the  stochastic  process  {X(t)  : 
t  E  J}  is  stationary  on  J . 

This  last  definition  is  the  continuous  version  cor¬ 
responding  to  the  discrete  version  used  by  Levielle  [4], 
where  the  determination  of  the  change  point  was  sought. 

3.  PROPERTIES  OF  LOCALLY 
STATIONARY  PROCESSES. 

The  section  investigates  some  of  the  properties  that 
follow  from  our  definitions.  To  remove  any  confusion 
with  the  locally  stationary  process,  we  will  call  a  pro¬ 
cess  that  is  stationary  on  the  (whole)  interval  a  globally 
stationary  process.  Now,  we  have  three  types  of  sta¬ 
tionary  processes  under  consideration:  globally,  locally 
and  piecewise.  In  this  section  we  give  properties  of  each 
type  and  also  their  relations  to  each  other.  First,  we 
establish  some  notation  to  clarify  the  presentation.  All 
processes  are  assumed  to  be  real-valued.  The  extension 
to  complex-valued  processes  is  obvious. 

Let  <5  denote  the  set  of  stochastic  processes  defined 
on  I,  Q  denote  the  set  of  globally  stationary  processes 
defined  on  I,  C  denote  the  set  of  locally  stationary  pro¬ 
cesses  defined  on  J,  and  V  denote  the  set  of  piecewise 
stationary  processes  defined  on  I.  Therefore, 

S  =  {X  is  stochastic  process  on  1} 

G  =  {X  is  globally  stationary  on  1} 

C  =  {X  is  locally  stationary  on  7} 

V  =  {X  is  piecewise  stationary  on  7} . 
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s 

Define  stochastic  process  equality  =  to  be  pointwise 
equality.  Thus,  for  X,  Y  £.S 

X  =  Y  if  and  only  if  X  ( t ,  w)  =  Y  ( t ,  w) 

for  al  lf€/,w6ft.  Here  =  denotes  real  equality. 

Theorem  1  Q  C  V  C  C  C  S. 

s 

Define  stochastic  process  addition  +  for  X,  Y  eS 
to  be 

X  +  Y  (t,w)  =  X(t,u)  +  Y(t,u) 

R 

for  al  lie/,  cj  €  Here  +  denotes  real  addition. 

Define  the  zero  stochastic  process  Z  to  be  Z(t,u)  = 
0  for  allt  £  I,  u>  £  fi.  (Z  is  the  identity  with  respect  to 
s 

+.)  There  are  several  algebraic  properties  concerning 
these  sets. 

5 

Theorem  2  S,  Q  and  V  are  closed  with  respect  to  + 

s 

addition.  C  is  not  closed  with  respect  to  +  addition. 
Therefore,  if  X,  Y  £  P  then  X  +  Y  €P.  If  X,  Y  € 

g 

C  then  X  +  Y  may  or  may  not  be  locally  stationary. 

g 

Define  stochastic  process  multiplication  •  for  X,  Y  € 
<S  to  be  pointwise  multipication, 


Some  pleasing  algebraic  results  follow. 

Theorem  5  (<S,  =,  +,  *C),  ( Q ,  =,  +,  • )  and  (V,  =,  +,  • ) 
are  linear  spaces  over  the  R.  Furthermore,  Q  is  a  sub¬ 
space  ofV  which  is  a  subspace  of  S. 

Theorem  6  (<S,^,+,5,  “),  (Q,  =,  +,' “)  and 

g  g  SC 

(■p,  +,  • , S-  )  are  commutative  linear  algebras  with  iden¬ 

tity  over  the  R.  Furthermore,  Q  is  a  subalgebra  of  V 
which  is  a  subalgebra  ofS. 

It  is  interesting  that  C  is  “between”  P  and  <S  but 
is  not  a  linear  space.  If  one  requires  some  additional 
conditions  then  a  linear  space  is  attained. 

Definition  6  Let  P  be  a  partition  of  I,  i.e.,  P  £  IP/. 
Let  Q  C  P  be  a  subpartition.  Let  Cq  denote  the  col¬ 
lection  of  stochastic  processes  that  are  stationary  on  all 
subintervals  J  in  the  subpartition  Q,  i.e., 

Cq  =  {X  is  stationary  on  J,  V  J  €  Q}. 

Theorem  7  Let  P  etyi  and  Q  C  P  be  a  subpartition, 
then  (Cq,  =,  +,  S-C)  is  a  linear  subspace  of  (S,  =,  +,  ■ ). 

Definition  7  Let  P  6  iP/.  Let  Vp  denote  the  collec¬ 
tion  of  piecewise  stationary  processes  that  are  station¬ 
ary  on  every  subinterval  of  the  partition  P. 

Vp  —  {X  is  stationary  on  J,  V  J  £  P}. 


[X?Y]  (t,u>)  =  X(f,«)*Y(i,w) 

R 

for  all  t  €  I,  cj  G  D.  Here  •  denotes  real  multiplication. 

Define  the  unit  or  identity  stochastic  process  U  to 
be  U(t,ui)  =  1  for  all  t  €  J,  w  €  0.  (U  is  the  identity 

g 

with  respect  to  • ) 

s 

Theorem  3  S,Q  and  V  are  closed  with  respect  to  ■ 

s 

multiplication.  C  is  not  closed  with  respect  to  ■  multi¬ 
plication. 

Therefore,  if  X,  Y  €  V  then  X  S  Y  eV.  If  X,  Y  £ 

g 

C  then  X  •  Y  may  or  may  not  be  locally  stationary. 

Define  scalar  multiplication  *c  to  be  the  following: 
given  r  €  R  and  X  £  S 

[r*cY]  (t,w)  =  r*Y{t,u) 

for  allt  £  /,  (J  €  D. 

Theorem  4  S,Q,V  and  C  are  closed  with  respect  to 
s-e  multiplication. 


Theorem  8  Let  P  £  fp /  then  (Vp,  =,  +,  • )  is  a  linear 
space.  If  Q  C  P  then  (Vp,  =,  +,  S-C)  is  a  subspace  of 


It  is  clear  that  if  the  subpartition  is,  in  fact,  the 
original  partition,  so  that  Q  =  P,  then  Cq  =  Cp  =  Vp. 
There  are  other  interesting  properties  concerning  Cq. 
One  concerns  the  “mixing”  of  the  subpartitions.  If  Pl 
and  P2  are  two  different  partitions  in  *P /  and  Q  i  C  Pi 
and  Q2  C  P2  are  subpartitions  then  what  can  we  say 
about  “mixing”  the  subspaces  Cq1  and  £q2?  It  will 
depend  on  the  subpartitions. 


Theorem  9  Let  P\,P2  £  tyi,  and  Q\  C  Pi,  Qi  C  P2 
be  subpartitions.  Let  m  denote  the  Lebesgue  measure. 

Ifm(Q1nQ2)  =  0  then  Cq^Qz  —  Ifm{QiC\Q2 )  >  0 

then  Cq1hq2  Cq,  fl  Cq%. 

In  the  case  when  the  intersection  QiCiQ2  has  posi¬ 
tive  measure,  then  there  exists  a  finer  partition  of  both 
Pi  and  P2  (say  P3)  such  that  Qif)Q2  is  a  subpartition 
of  P3.  When  we  union  all  these  subspaces  Cq  we  get 
C. 
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Theorem  10  The  union  over  all  the  subspaces  Cq  yields 
C,  that  is, 

U 

QcpsVi 

Similarly  for  Vp  ,  that  is, 

V=  |J  VP. 

The  next  group  of  properties  relate  a  stochastic  pro¬ 
cess  and  a  deterministic  process,  that  is,  a  function. 

Theorem  11  If  X  G  Q  and  f  is  piecewise  constant 
function  on  I,  then 

1.  Y(t)  =  /(£)  +  X(t)  is  piecewise  stationary  on  I. 

2.  Y(t)  =  f(t)  ■  X(t)  is  piecewise  stationary  on  I. 

3.  Y(t)  =  X(t  +  f(t))  is  piecewise  stationary  on  I 
(only  if  I  =  E). 

Proof  of  1.  Let  J  C  /  be  a  subinterval  where  /  is 
constant,  so  that  /(f)  =  k  for  t  G  J  and  some  constant 
k.  Then,  for  any  S  G  Tj  and  r  G  6(S,  J),  we  have 

py(S+r )  =  Pr[/(fi +r) +X(ti +r)  <  Cl, 

•  ■  •  ,  f(tn  +  r)  +  X(tn  +  r)  <  Cn] 

=  Pf  U(h)  +  X(ti)  <  ci, 

■  •  •  >  f{tn)  +  X(tn)  <  C„] 

-  py(s) 

since  /(t*  +  r)  =  f(U)  for  r  G  6(S,  J). 

Note  that  the  process  {Y(t)  :  t  G  J}  may  have  a 
mean  which  varies  with  time,  but  the  covariance  struc¬ 
ture  is  the  same  as  that  of  (X(t)  :  t  G  J}.  For  s,  t  G  J 

Cov  [Y(s),  Y(t)]  =  E  {(f(s)  -  X(s)  -  E  [f(s)  -  X(*)]) 
x  (/ (t)  —  X(t)  —  E[f  (t)  —  X (t)])j 
—  E[(X(s)  —  E[X(s)])  (X(t)  -  E[X(t)])] 

=  Co v[X{s),X{t)}. 

Proof  of  2.  Similar  to  1.  For  this  process,  both 
the  mean  and  the  covariance  structure  may  vary  with 
time.  For  s,t  G  J 

Co  v[F(S),F(t)] 

=  E[(f(s)X(s)-E[f(s)X(s)]) 

x  (/ (t)X (t)  —  E  [/ (f)X(f)] )] 

=  f(s)f(t)Cov[X(s),X(t)}. 

Proof  of  3.  Similar  to  1.  For  this  case,  the  mean 
of  {Y(t)  :  t  G  J}  is  the  same  as  for  {X(t)  :  t  G  J},  but 


the  covariance  structure  changes  as  a  function  of  time. 
For  s,t  G  J 

Cov[Y(s),  Y(t)]  =  Cov  [X  (s  +  f(s)) ,  X  (t  +  f  (f ))] 

=  Co  v[X(s),X(t  +  f(t)-f(s))] 

=  Cov[X(0  ),X(t-8  +  f(t)-f(s))]. 

Theorem  12  Suppose  the  processes  Xi,X2)...  ,XN 
G  G,  and  the  functions  fi,  fa,...  ,  fpj  are  piecewise  con¬ 
stant  on  I.  Then  the  process  Y  given  by 

N 

j= 1 

is  piecewise  stationary  on  I. 

For  this  process,  both  the  mean  and  the  covariance 
structure  may  vary  with  time,  since  for  s,t  G  J  G  P  G 
Vi 

Cov  [Y(s),Y(t)} 

=  E  [  (Ei  fi  ('Ms)  -  E  E,  h  {s)Xi  ( a )]) 

x  (Ej  fj(*)Xj(s)  -  E  [Ei  M*)Xi{s)]  )] 

=  Ei  Ej  fi(s)fj(t) Cov  [Xi(s),Xj(t)] . 

Definition  8  (Locally  constant  on  an  interval).  Let 
I  C  K  be  an  interval  (possibly  KJ.  A  function  f  defined 
on  I  is  said  to  be  locally  constant  on  I  if  there  exists 
some  partition  P  G  iP/  and  at  least  one  subinterval 
J  G  P  such  that  f  is  constant  on  J.  Let  CC  denote 
the  collection  of  locally  constant  functions  defined  on 
/,  that  is, 

CC  =  {/  :  I  — ►  K  |  /  is  locally  constant  on  I}. 
Theorem  13  IfXeG  and  f  G  CC,  then 

1.  Y  =  /  +  X  G  C. 

2.  Y  =  f  ■  X  G  C. 

3.  Y(t)  =  X(t  - f  /(f))  G  C  (only  if  I  =  K). 

Notice  that  if  we  define  the  identity  function  i  to  be 
i(t)  =  t  for  all  t  G  J,  then  the  last  result  is  a  composi¬ 
tion  of  a  stochastic  process  with  a  deterministic  process 
(a  function).  That  is,  Y  =  X  o  (*  +  /). 

Other  questions  regarding  composition  remain  to  be 
asked.  For  example,  what  about  g  o  X  processes?  Let 
K[x]  denote  the  set  of  polynomials  with  real  coefficients 
and  indeterminant  x,  so  that  g  G  M[:r]  implies  that 
g(x)  =  ao  +  a\x  +  ■  ■  ■  +  onxn  for  some  degree  N  G  N 
and  coefficients  ao,Oi,...,ajv  €  K. 

Theorem  14  Let  X  G  C,  then  X2  =  X  S  X  eC.  Fur¬ 
thermore,  for  each  g  G  Mjz]  then  joXg£. 
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Theorem  15  For  each  X  E  £  the  set  £x  =  {Y  = 
ff(X)  3;]}  is  a  commutative  algebra  with  iden¬ 

tity. 

There  are  invariance  results  related  to  composition 
for  the  other  types  of  stationarity. 

Theorem  16  Let  g  E  K[x]. 

1.  IfXeg  then  g  o  X  eQ. 

2.  IfXeV  then  g  o  X  EV. 

3.  IfXEC  then  g  o  X  EC- 

4 .  IfXES  then  g  o  X  ES. 

4.  CONCLUSIONS 

Recently  Donoho  [2]  called  for  a  new  definition  of  lo¬ 
cally  stationarity  since,  “there  is,  at  the  moment,  no 
definition  which  really  captures  all  facets  of  local  sta¬ 
tionarity”.  We  believe  our  definition  does  capture  all 
such  facets.  Space  does  not  allow  for  the  other  proper¬ 
ties.  More  results  are  forthcoming  concerning  applica¬ 
tions. 

All  these  properties  and  theorems  hold  true  if  we 
replace  the  real  numbers  with  complex  numbers. 
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ABSTRACT 

This  paper  presents  a  statistical  performance  comparison  between 
the  cyclic  moments-based  and  Wigner-Ville  distribution-based  in¬ 
stantaneous  frequency  estimators  for  linear  FM  signals  in  real¬ 
valued  multiplicative  and  complex-valued  additive  noise.  Theo¬ 
retical  results  are  used  to  compare  the  performance  of  the  estima¬ 
tion  algorithms  over  a  wide  range  of  conditions.  Simulation  results 
confirm  our  theoretical  derivations. 

1.  INTRODUCTION 

Accurate  estimation  of  the  instantaneous  frequency  (IF)  of  fre¬ 
quency  modulated  (FM)  signals  is  important  in  many  engineering 
applications  such  as  radar,  sonar,  acoustic  emission  and  telecom¬ 
munications  [2],  In  many  cases  it  is  assumed  that  the  signal  of 
interest  has  a  constant  amplitude.  While  this  is  a  valid  assumption 
in  a  wide  range  of  scenarios,  there  are  several  important  applica¬ 
tions  in  which  the  constant  amplitude  assumption  is  inappropriate. 
Examples  include  fading  in  wireless  communications  [13]  and  the 
case  of  a  fluctuating  target  in  radar  [14].  In  both  of  these  cases  the 
amplitude  varies  randomly  with  time  and  the  discrete-time  signal 
may  be  written  as 

Xt  =  Atst  +  Wt,  f  =  0, ...,n  — 1  (1) 

where  sf  =  exp At  is  a  real-valued  stationary  random  pro¬ 
cess  with  mean  /j,a  and  variance  o\  and  Wt  is  a  complex-valued 
stationary  random  process,  independent  of  At,  with  variance  crfy- 
The  methods  available  for  estimating  the  IF  of  st  may  be  sepa¬ 
rated  into  two  classes.  One  class  of  estimators  consists  of  paramet¬ 
ric  methods  in  which  the  phase  <j>t  is  modelled  by  a  sum  of  basis 
sequences, 

Q 

<t>t  =  '^biit,i  (2) 

i=0 

An  IF  estimate  can  be  obtained  from  estimates  of  the  phase  pa¬ 
rameters  bo ,  • . . ,  bq .  A  commonly  used  set  of  basis  sequences  is 
7 t,i  =  t\  i  =  0, . . . ,  q,  in  which  case  the  signal  st  is  referred  to  as 
a  polynomial  phase  signal  (PPS).  Estimators  of  the  phase  parame¬ 
ters  for  the  PPS  model  have  been  proposed  in  [4,  12]. 

The  second  class  of  IF  estimators  consists  of  non-parametric 
methods.  A  subclass  of  these  non-parametric  methods  are  based  on 
time-frequency  distributions  such  as  the  Wigner-Ville  distribution 
(WVD)  and  its  higher-order  generalisation,  the  polynomial  WVD 


(PWVD)  [3].  These  estimators  may  be  applied  regardless  of  the 
form  of  4>t  though  it  should  be  kept  in  mind  that  a  systematic  bias 
will  result  unless  st  is  a  PPS.  If  st  is  not  a  PPS  accurate  IF  esti¬ 
mates  can  still  be  obtained  from  the  PWVD  by  using  an  adaptive 
window  [1]. 

The  purpose  of  this  paper  is  to  compare  the  performances  of 
the  cyclic  moments-based  procedure  [  1 2]  and  the  WVD-based  pro¬ 
cedure  for  estimating  the  IF  of  st  from  observations  xq,  . . . ,  xn-i 
of  (1).  The  signal  st  is  assumed  to  be  a  unit  modulus  PPS  of  order 
two  i.e.  the  phase  <fit  is  given  by  (2)  with  q  =  2  and  7 t,i  =t',i  = 
0, 1, 2,  t  =  0, . . . ,  n  —  1.  Theoretical  expressions  for  the  vari¬ 
ances  of  both  estimators  are  given  and  compared  for  various  noise 
scenarios.  The  theoretical  results  are  confirmed  using  numerical 
simulations. 

The  paper  is  structured  as  follows.  Section  2  contains  a  brief 
review  of  the  WVD-based  and  cyclic  moments-based  IF  estima¬ 
tors.  In  this  section  we  also  verify  the  suitability  of  the  WVD  for 
IF  estimation  of  random  amplitude  linear  FM  signals.  Expressions 
for  the  variances  of  these  estimators  are  given  in  section  3.  The 
theoretical  results  are  used  to  compare  the  WVD-based  and  cyclic 
moments-based  IF  estimators  in  section  4.  Simulation  results  are 
included  to  confirm  the  theoretical  results.  The  paper  concludes 
with  a  discussion  of  the  main  results. 


2.  REVIEW  OF  ESTIMATORS 

In  this  section  we  review  the  WVD-based  and  cyclic  moments- 
based  IF  estimators.  We  also  present  new  results  which  verify  the 
suitability  of  the  WVD  for  IF  estimation  of  random  amplitude  lin¬ 
ear  FM  signals. 


2.1.  WVD-based  IF  Estimator 

It  is  well-known  that  the  peak  of  the  WVD  can  be  used  to  estimate 
the  IF  of  constant  amplitude  linear  FM  signals  [11],  In  this  paper 
we  propose  the  use  of  the  WVD  for  estimating  the  IF  of  random 
amplitude  linear  FM  signals.  We  show  that,  contrary  to  statements 
made  in  [3]  regarding  the  necessity  of  higher-order  PWVDs,  the 
WVD  may  be  used  to  estimate  the  IF  of  random  amplitude  linear 
FM  signals. 

The  discrete-time  WVD  of  the  sequence  xo,  ■  ■  ■ ,  xn-i  is  de- 
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fined  as 


1 

W2x(t,ui)  =  ^  —  £  Xt+sxt-(exP(-j2ult)  (3) 

£=— m 

where  m  =  min(n  -  1  -  t,t).  Throughout  the  paper  we  use 
upper-case  letters  e.g.  Xt  to  denote  random  variables  and  lower¬ 
case  letters  e.g.  xt  to  denote  realisations  of  random  variables.  Note 
that  scaling  by  the  window  length  1  / (2m  + 1)  is  introduced  for  the 
sake  of  normalisation  and  does  not  affect  the  statistical  properties 
of  the  WVD-based  IF  estimator.  Substituting  for  xt  gives 

1  m 

W2x(t,ui)  =  2m+1  £  exp{-J2(w  -vt)0 

£=— m 

(4) 

where  Yt  —  At  +  Ut,  Ut  =  Wt/st  and  uit  =  hi  +  2b2t  is  the  IF 
of  st.  Assuming  that  the  additive  noise  is  complex  white  Gaussian 
we  obtain 

EW2x(t,ui)  =  2m' + "I  EiMA+C2A(2()  +  ^Se} 

£=— m 

x  exp{-y'2(w  —  o>t)£}  (5) 

+C2a(u>)  *  (u>  —  Lit)  +  (6) 

where  8$  is  the  Kronecker  delta,  c2a(£)  =  Cum(At,  At+s)  is  the 
second-order  cumulant  or  covariance  of  At, 


increases,  the  WVD  converges  in  probability  to  its  expected  value 
[5]  i.e.  as  m  oo, 

W2x(t,u)  EW2x{t,u) 

ixa5(uj  —  u>t)  +  o(l)  (11) 

Eq.  (11)  shows  that,  for  sufficiently  large  window  lengths,  the 
WVD  exhibits  a  peak  at  the  IF  u  =  uit.  This  result  leads  to  the 
WVD-based  IF  estimator, 

u>t  =  argmaxW2x(t,w),  f  =  0,  ...,n  —  1  (12) 

U> 

Note  that  when  ha  =  0,  the  WVD  will  not  exhibit  a  peak  at 

ui  =  uit . 

2.2.  Cyclic  Moments-based  Estimator 

Shamsunder  et  al.  [12]  proposed  a  procedure  for  estimating  the 
phase  parameters  of  st  using  estimates  of  the  cyclic  moments  of 
Xt.  We  provide  only  a  brief  outline  of  the  estimation  procedure 
here.  The  interested  reader  is  referred  to  [12]  for  further  details. 

We  define  the  second-order  and  first-order  cyclic  moment  es¬ 
timators,  respectively,  as 

n  — r—  1 

M2x{a\r)  =  1/n  £  XtXt+T  exp(-jat)  (13) 

t= 0 

72  —  1 

Mix  (a)  =  1/n  y  xt  exp(-jat)  (14) 

t= o 


C2a(v)  =  ^2  c2A(Oexp(-ju>$) 

Z=-oo 

is  the  second-order  spectrum  of  At,  V  denotes  the  convolution 
operation  and 

A(i)H  =  sin(/w)/sin(w).  (7) 

Using  the  additivity  property  of  the  cumulant  [9],  the  variance  of 
the  WVD  may  be  written  as 

j  mm 

var {W2X(t,u)}  =  (2m+-irr  £  £ 

cun)(Yt+f1Ytl;1,Yt+(2Yt*-f2)exp{-j2(u)  -o>t)(£i  +6)} 

(8) 


For  large  window  lengths  we  obtain, 

j  mm 

var{W2x{t,w)}  =  (2m  +  1)2-  £  £ 

€l=-m{2=-m 

[c2a(^i  -?2)2  +  {C2A{(,\  +  £2)  +  crw8z1+z2}2 
+2/r^{C2A(Cl  —  £2)  +  C2.4(£l  +  £2)  +  Cw^l-t-^}] 
x  exp{-j2{u  -  Wt)(£  1  +  £2)}  +  0(m~2),  (9) 

where  we  have  assumed  the  mixing  condition, 

OO 

£  |cm(£i,...,£p— i)|  <  00,  p  =  l,...,4  (10) 

€i»»mCp-j=-oo 


where  r  is  an  arbitrary  positive  integer,  0  <  r  <  n  —  1.  Estimates 
of  the  phase  parameters  61  and  b2  and  the  IF  uit  can  be  obtained 
from  the  second-order  and  first-order  cyclic  moment  estimators  us¬ 
ing  the  procedure  of  Table  1. 


Table  1:  Cyclic  Moments-based  IF  Estimator 


1. 

Estimate  b2  as 

62  =  1/(2t)  argmax|Ad2x(a;,r)|2. 

Q 

(15) 

2. 

Form  the  signal  x\l)  =  xt  exp(—jb2t2)  and  es¬ 
timate  b  1  as 

61  =  argmax|Aflx(i)(a)|2. 
a 

(16) 

3. 

Estimate  the  IF  as 

U)t  —  61  +  262^,  t  =  0,  .  .  .  ,  71  —  1. 

(17) 

In  order  to  avoid  aliasing  the  phase  parameters  must  satisfy: 

IM  <  7r/(2r),  |fci|  <  7t  (18) 

3.  STATISTICAL  ANALYSIS 


where  cpa (£1 ,  • . .  ,£P-i)  =  cum(A(,  Af+?1 , . . . ,  At+^_1).  We  In  this  section  we  perform  statistical  analyses  of  the  WVD-based 

see  from  (9)  and  (10)  that  M&r{W2X{t, w)}  is  0(m-1).  It  fol-  and  cyclic  moments-based  IF  estimators  under  the  following  as- 

lows  from  the  Chebyshev  inequality  that,  as  the  window  length  sumptions: 
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A 1  At  is  a  stationary  real-valued  random  process  with  pth  order 
cumulant  cpa(£i,  ■ . .  ,£P-i).  It  is  assumed  that 

OO 

Y2  |Cp.4(£l,...,£p-l)|  <  OO,  P=l,2,  ... 

Ci.— .€p-i=-oo 

A2  Wt,  t  =  0, . . . ,  n  —  1  are  zero-mean  independent  and  iden¬ 
tically  distributed  (iid)  complex-valued  Gaussian  random 
variables  with  finite  variance  a\v. 

The  results  we  present  below  are  asymptotic  in  the  sense  that  they 
are  valid  as  the  sample  length  n  — >■  oo. 


Eq.  (23)  shows  that  the  variance  of  the  cyclic  moments-based  IF 
estimator  is  unaffected  by  correlation  in  the  modulating  process 
and  fails  when  the  mean  ha  of  the  modulating  process  is  zero. 

We  note  that  variances  of  the  parameter  estimators  6i  and  62, 
and  the  IF  estimator  uit  depend  on  the  lag  r  chosen  in  the  second- 
order  cyclic  moment.  The  dependence  of  var(6;),  i  =  1,2  on 
t  was  noted  in  [6]  where  it  was  shown  that  the  variances  of  61 
and  l>2  may  be  minimised  by  choosing  r  =  n/ 2  provided  that 
2/Sa  +  1  /Sw  <  5.46.  Comparison  of  (20)  and  (23)  indicates 
that  var(i 0/)  has  the  same  dependence  on  r  as  the  variances  of  Si 
and  S2.  It  follows  that  var(wt)  will  also  be  minimised  by  selecting 
t  =  n/ 2. 


3.1.  WVD-based  Estimator 

Under  the  assumptions  A1  and  A2,  the  variance  of  the  WVD-based 
IF  estimator  can  be  found  as 

= *,(2„3,+i)»  (2 +  h + £) + °<"r7/2) 

where  Sa  =  Ha/^a  ar|d  Sw  =  Ha/^Iv-  A  brief  proof  of  (19) 
is  given  in  Appendix  1.  The  following  comments  can  be  made 
regarding  (19): 

•  The  variance  of  Cot  is  not  affected  by  correlation  in  the  mod¬ 
ulating  process  At. 

•  The  variance  of  Cbt  will  increase  dramatically  at  either  end 
of  the  sample  i.e.  t  close  to  zero  or  n.  This  follows  from  the 
fact  that  the  window  length  2m  +  1  decreases  as  we  move 
away  from  the  centre  of  the  observation  interval. 

•  When  ha  =  0,  var(2>t)  goes  to  infinity  indicating  that  the 
WVD-based  IF  estimator  breaks  down.  This  is  consistent 
with  the  analysis  of  section  2. 1 . 


3.2.  Cyclic  Moments-based  Estimator 

The  covariance  matrix  of  the  cyclic  moments-based  parameter  es¬ 
timators  was  derived  in  [7]  under  the  assumption  that  At  are  iid. 
This  analysis  was  extended  to  the  more  general  case  in  which  At 
are  correlated  in  [8],  Let  b  =  (bu  62)'  and  b  =  (61, 62)'.  It  was 
shown  in  [8]  that 


NE(b-b){b-b)'N 


2  +  2 /Sa  +  1  /Sw  —  iz(r) 

2f2(l  —  f)3 


«£(-!  “!)+£(o  S  )+<*■"»> 

where  N  =  diag(ra3/2,  n5/2),  f  =  r/n,  0  <  r  <  n  -  1  and 

=  !<$  (2., 


Using  these  results  we  obtain  the  variance  of  the  cyclic  moment- 
based  IF  estimator  as 


var(wt)  =  var(6i) +  4cov(6i,62)f +  4var(S2)f2,  (22) 

3  f  (2  +  2/SA  +  ljSw  -v{f)\ 

Swn 3  \  \  2f2(l  —  f)3  / 

x  (1  4  t/n  +  4f2/n2)  +  2)  +  0(n-7/2J23) 


4.  COMPARISON 

In  this  section  we  compare  the  variances  of  the  WVD-based  and 
cyclic  moments-based  IF  estimators. 

We  conduct  the  comparison  between  the  estimators  at  the  mid¬ 
point  of  the  sample  i.e.  t  =  n/2.  In  this  case  we  have,  ignoring 
lower-order  terms, 

var(""'2)  =  s^(2  +  Ta+^)  <24> 

”'<*■«>  =  (25) 

We  see  from  (25)  that  the  cyclic  mdments-based  IF  estimator  ex¬ 
hibits  interesting  properties  for  the  case  t  =  n/2.  These  are  de¬ 
scribed  below: 

•  var(wn/2)  is  independent  of  the  variance  of  the  mod¬ 
ulating  process  i.e.  the  cyclic  moments-based  IF  estimator 
at  t  —  n/2  is  unaffected  by  the  presence  of  multiplicative 
noise.  This  is  in  contrast  to  the  WVD-based  IF  estimator 
which  becomes  less  accurate  as  a\  increases. 

•  var(wn/2)  is  independent  of  the  lag  r.  We  see  from  (18) 
that  the  range  of  allowable  values  of  b2,  and  therefore  ut, 
can  be  increased  by  decreasing  r.  In  general,  this  is  not 
viable  since  the  accuracy  of  the  cyclic  moments-based  es¬ 
timator  is  poor  for  small  values  of  r.  Such  considerations 
do  not  apply  when  estimating  the  IF  at  the  mid-point  of  the 
sample  since  we  can  choose  small  values  of  r  with  no  loss 
in  accuracy. 

It  is  evident  from  (24)  and  (25)  that,  for  t  =  n/2 ,  the  cyclic 
moments-based  IF  estimator  has  lower  variance  than  the  WVD- 
based  IF  estimator  under  all  noise  conditions.  The  variances  of  the 
estimators  will  be  approximately  equal  under  what  may  be  called 
high  signal-to-noise  ratio  conditions,  Sa  >  1  and  Sw  »  1. 

The  theoretical  results  given  above  are  now  confirmed  using 
simulations.  The  signal  of  interest  st  is  a  second  order  PPS  with 
phase  parameters  b0  =  n/3,  bi  =  tt/32  and  &2  =  7r/2500.  The 
sample  length  is  n  =  512  and  the  cyclic  moment-based  IF  esti¬ 
mate  is  computed  using  t  =  n/ 4.  The  modulating  process  At  is  a 
sequence  of  iid  Gaussian  random  variables.  We  estimate  the  vari¬ 
ances  of  the  cyclic  moments-based  and  WVD-based  IF  estimators 
using  5000  realisations  of  the  signal  (1).  Figure  1  shows  the  theo¬ 
retical  and  estimated  variances  plotted  against  Sa  =  -5(1)20  dB 
for  Sw  =  0(5)15  dB.  A  close  correspondence  between  the  the¬ 
oretical  and  empirical  results  is  evident.  We  note  that  for  Sw  = 
5, 10  and  15  dB  the  cyclic  moments-based  and  WVD-based  IF 
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(28) 


estimators  fail  at  the  same  value  of  Sa-  The  cyclic  moments- 
based  estimator  exhibits  a  slightly  better  threshold  performance 
for  Sw  =  0  dB. 


Figure  1:  Variances  of  the  WVD-based  and  cyclic  moments-based 
IF  estimators  plotted  against  Sa  for  a)  Sw  =  0  dB,  b)  Sw  =  5 
dB,  c)  Sw  =  10  dB,  and  d)  Sw  =  15  dB.  Theoretical  (solid  line 
for  cyclic  moments  and  dashed  line  for  WVD)  and  estimated  (’o’ 
for  cyclic  moments  and  ’  x  ’  for  WVD)  results  are  shown. 


5.  DISCUSSION 

The  problem  of  estimating  the  instantaneous  frequency  (IF)  of  lin¬ 
ear  FM  signals  in  real-valued  multiplicative  and  complex  white 
Gaussian  additive  noise  was  considered.  Theoretical  analyses  of  a 
cyclic  moments-based  IF  estimator  and  the  Wigner-Ville  distribu¬ 
tion  (WVD)-based  IF  estimator  were  performed  under  the  assump¬ 
tions  of  a  mixing  modulating  process  and  complex  white  Gaussian 
additive  noise.  The  theoretical  results  were  used  to  compare  the 
performances  of  these  two  estimators  at  the  mid-point  of  the  obser¬ 
vation  interval.  This  comparison  showed  that  the  cyclic  moments- 
based  estimator  has  a  lower  variance  than  the  WVD-based  estima¬ 
tor  for  all  noise  scenarios.  The  difference  between  the  variances 
of  the  two  estimators  becomes  particularly  large  as  the  variance 
of  the  modulating  process  increases.  This  is  due  to  the  interesting 
fact  that,  at  the  mid-point  of  the  sample,  the  variance  of  the  cyclic 
moments-based  IF  estimator  is  independent  of  the  variance  of  the 
multiplicative  noise. 

Future  work  on  this  topic  will  concentrate  on  extending  the 
analysis  to  arbitrary  order  polynomial  phase  signals. 

A.  DERIVATION  OF  WVD-BASED  IF  ESTIMATOR 
VARIANCE 

We  begin  by  writing 

Xt+^X*-^  =  exp(j2wt^)  +  Z$,  £  =  -m,...,rn  (26) 

where  Zf  =  exp(j2ut£)  and 

V{  =  (-At+i  +  Ut+z)(At-z  +  Ut-$)  —  (27) 


=  /*a  +  C2a(2£)  +  ow&(.- 

Eq.  (26)  shows  that  the  WVD  at  time  t  can  be  seen  as  the  dis¬ 
crete  Fourier  transform  of  a  complex  sinusoid  with  deterministic 
varying  amplitude  in  non-stationary  additive  noise.  Using  a 
generalisation  of  the  result  in  [10]  the  IF  estimator  error  can  be 
found  as 

m 

at-ut=  E  -  i€«6(a*)}/(2C) 

£=— m 

+Op{m-2)  (29) 

where 

m  m 

C=  E  E  (30) 

£1=-m?2  =  -m 
m 

dz(uj)  =  E  VS  exP(~juO  (31) 

£=— ra 

and  d^\uJo)  denotes  the  nth  derivative  of  dz(w)  with  respect  to 
Lj  evaluated  at  w  =  wo-  Substituting  for  d'^iZuJt)  and  d‘z(2ujt)  in 
(29)  gives 

m  m 

Cbt—u>t=  E  «  E  (6-0lm(ni)/(2C)  +  OP(m-2) 

£=  — m  —  m 

(32) 

Since  E  Vj  =0,  the  estimator  u>t  is  asymptotically  unbiased.  The 
variance  is 

m  m  m  m 

var (Qt)  =  E  E  A*€i/*«a  E  E  ^3-^) 

£1==  — 771  ^2=  — 771  =  —  m^4  =  —  m 

x(£4  -  e2)EIm(V53)Im(V54)/(4C2)  +  0(m~7/2)  (33) 
Simple  calculations  give 

E Im(^3)Im(Vk)  =  E  (Vt,V£  -  Vt,Vu)/2  (34) 
=  {2 (cta  +fiA)&w  +  <rw}(<5e3-?4  —  fya+(4)/2  (35) 
Substituting  (35)  into  (33)  gives 

var(ii>t)  = 

mm  m 

X  E  E  E  (2f -2^i)  +  0(m-7/2)  (36) 

£l=—m  rn  £——m 

Substituting  (28)  into  (30)  gives 

m  m 

C  =  E  E  (£l  +  /TaC2a(2£i) 

=—m  &=-m 

+IMc'2a{'2&)  +  1  +  <5e2)  +  C2a(2£i)c2a(£2) 

+c2a(2£i)ctw<^2  +  C2A(24i)<rw<5?i  +4'^]%}  (37) 

We  will  show  that  only  the  first  term  in  the  summand  of  (37)  is 
significant.  The  remaining  terms  are  of  lower  order  and  may  be 
ignored  for  large  window  lengths.  It  is  straightforward  to  see  that 

m  m 

E  E  (Ci-fi&)^  =/M (2m +  1)4/12  (38) 

£l=-m£2=-ro 
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The  second  term  is  given  by 

m  m 

V  T  (£1  -66)/</iC2/t(2Ci) 

5l  =-m  £2  =-m 

m 

=  p\{2m+\)  E  £{£  -  (2m  4-  1)/2}c2.4(2£) 

£=—  m 
m 

<p2A(2m  +  l)  53  |^-(2m  +  l)/2}c2A(20| 

£=  —  m 
m 

<  p2Am(2m  +  1)  2  E  lc^(OI  (39) 

€=—m 

It  follows  from  the  mixing  condition  of  assumption  A 1  that  this 
second  term  will  be  0(m3)  and  so  may  be  ignored  compared  to 
the  first  term  which  was  shown  to  be  0(m4)  in  (38).  A  similar 
analysis  can  be  performed  for  the  remaining  terms  in  £  to  obtain 

C  =  /4(2m  +  l)Vl2  +  0(m3)  (40) 

A  similar  analysis  for  the  numerator  of  (36)  yields 

mm  m 

E  E  ^1^2  E  (2e2-2tfi)  = 

=-m  ^2=— m  £=— m 

/4(2m  +  l)5/6  +  0(m4)  (41) 
Substituting  (40)  and  (41)  into  (36)  gives  (19). 
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1.  ABSTRACT 

A  method,  which  will  be  referred  to  as  Broomhead’s  filter 
method,  is  reviewed.  This  method  uses  a  nonlinear  inverse 
to  a  linear  bandstop  filter  to  obtain  better  noise  reduction 
results,  in  terms  of  signal  to  noise  ratio,  than  linear  noise  re¬ 
duction  techniques,  for  the  cancellation  of  wideband  chaotic 
noise  from  a  sinusoid.  A  novel  and  unorthodox  approach  is 
suggested  for  the  linear  bandstop  filtering  aspect  of  Broom- 
head’s  filter  method,  which  allows  it  to  be  applied  in  situ¬ 
ations  where  the  signal  of  interest  has  a  broader  spectrum 
than  that  of  a  sinusoid.  This  unorthodox  approach,  referred 
to  as  the  modified  Broomhead  filter  method,  is  used  to  can¬ 
cel  chaotic  noise  and  sea  clutter  from  narrowband  Gaussian 
signals  of  interest. 

2.  INTRODUCTION 

In  [1]  Broomhead  et  al.  proposed  using  a  nonlinear  inverse 
to  a  linear  bandstop  filter,  in  order  to  cancel  wideband 
chaotic  noise  from  narrowband  signals  of  interest.  This 
technique  will  be  referred  to  as  Broomhead’s  filter  method. 
In  an  experiment  they  carried  out,  involving  a  sine  wave 
corrupted  by  chaotic  Ikeda  [2]  noise,  Broomhead  et  al. 
showed  that  a  radial  basis  function  network  (RBFN)  [3] 
nonlinear  inverse  was  able  to  obtain  reasonable  perform¬ 
ance  when  the  noise  process  was  not  known  beforehand 
(i.e.  the  RBFN  inverse  was  trained  using  the  noise 
corrupted  signal  of  interest).  They  also  described  the 
RBFN  inverse  as  “indispensable”,  when  the  noise  process 
was  known  beforehand  (i.e.  the  RBFN  inverse  was  trained 
using  the  noise  process  alone).  No  linear  comparisons 
were,  however,  carried  out  in  this  work.  Strauch  [4] 
carried  out  the  same  experiments  as  those  carried  out  by 
Broomhead  et  al.,  but  with  linear  comparisons.  In  the 
experiment  with  a  sine  wave  corrupted  by  Ikeda  noise, 
Strauch  found  that  when  the  noise  process  was  not  known 
beforehand,  there  was  little  or  no  improvement  obtained 
by  using  a  nonlinear  inverse,  with  respect  to  using  a 
linear  inverse.  Also,  Strauch  reported  that  very  little 
improvement  was  observed  when  the  noise  process  was 
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known  beforehand.  Strauch  reported  that  both  the  linear 
and  nonlinear  inverses  performed  more  poorly  than  a  6fh 
order  Butterworth  filter  when  the  noise  process  was  known 
beforehand,  and  when  it  was  not.  The  results  for  the  case 
when  the  noise  process  was  known  beforehand  seemingly 
contradict  the  “indispensable”  verdict  of  a  nonlinear 
inverse  made  by  Broomhead  et  al.  in  such  a  situation. 
Due  to  the  seemingly  contradictory  results  of  Broomhead 
et  al.  and  Strauch,  it  was  decided  to  re-investigate  the 
cancellation  of  Ikeda  noise  from  a  sinusoid  experiment, 
using  Broomhead’s  filter  method  and  linear  comparisons. 
The  re-investigation  of  this  experiment  lead  onto  a  novel 
modification  of  Broomhead’s  filter  method,  which  allowed 
it  to  be  used  for  the  cancellation  of  noise  from  signals  of 
interest  which  had  a  broader  power  spectrum  than  that  of 
a  sinusoid.  This  modified  Broomhead  filter  method  was 
applied  to  the  cancellation  of  wideband  chaotic  Ikeda  noise 
from  narrowband  Gaussian  signals  of  interest.  Finally,  the 
modified  filter  method  was  applied  to  the  cancellation  of 
radar  sea  clutter  data  from  narrowband  Gaussian  signals. 
The  radar  clutter  data  sets  were  provided  by  the  Defence 
Evaluation  and  Research  Agency  (DERA)  in  the  UK. 

The  structure  of  this  paper  is  as  follows.  In  section  2  a  de¬ 
scription  of  the  clutter  data  sets  is  given.  In  section  3  a  de¬ 
scription  of  Broomhead’s  filter  method  is  given.  In  section 
4  the  cancellation  of  broadband  Ikeda  noise  from  a  sinus¬ 
oid  experiment  using  Broomhead’s  filter  method  and  linear 
comparisons  is  presented.  In  section  5  the  modified  Broom¬ 
head  filter  method  is  discussed,  and  results  are  presented 
for  the  cancellation  of  Ikeda  noise  and  sea  clutter  from  nar¬ 
rowband  Gaussian  signals. 

3.  SEA  CLUTTER  DATA 

3.1.  Data  collection  method 

A  stationary  land-based  radar  was  operated  in  a  dwelling 
mode,  that  is,  with  the  antenna  pointing  towards  a  patch 
of  the  sea  surface  along  a  fixed  direction. 

3.2.  The  wavetank  sea  clutter  data  sets 

The  wavetank  data  sets  were  recorded  in  April  1998  as 
part  of  an  experiment  conducted  by  DERA  Malvern  and 
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Racal  Radar  Defence  Systems,  at  the  large  wavetank  facil¬ 
ity,  in  the  ocean  engineering  laboratory  of  the  University  of 
California,  Santa  Barbara.  The  radar  used  was  the  Racal- 
Thorn  mobile  instrumented  data  acquisition  system  (MI¬ 
DAS).  The  wavetank  is  53m  long,  4.26m  wide,  and  2.13m 
deep.  The  wind  tunnel  extends  30.5m  down  the  tank,  leav¬ 
ing  an  open  test  section  of  22.5m.  A  wooden  beach  at  the 
test  end  of  the  tank  reduces  reflections.  The  wind  tunnel 
can  produce  wind  speeds  of  up  to  12ms-1.  The  MIDAS 
radar  used  pulse  compression.  Pulse  compression  is  a  sig¬ 
nal  processing  technique  which  allows  a  radar  to  use  a  long 
pulse  to  obtain  a  large  radiated  energy,  but  which  also  al¬ 
lows  the  range  resolution  of  a  short  pulse  to  be  achieved  [5], 
The  range  resolution  of  the  radar  was  0.3m  (i.e.  an  effective 
pulse  width  of  2ns).  Data  was  collected  in  32  range  cells, 
during  wind  speeds  of  4ms-1  through  to  12ms-1,  in  steps  of 
lms-1.  Pulse  to  pulse  transmit  frequency  agility  was  used, 
in  a  known  (i.e.  not  randomised)  sequence.  The  radar  has 
a  dual-polarised  receiver.  Only  the  transmit  horizontal,  re¬ 
ceive  horizontal  (HH)  data  sets  were  made  available  for  the 
work  reported  in  this  paper.  The  effective  pulse  repetition 
frequency  (PRF)  of  the  radar  was  1kHz.  The  grazing  angle 
and  beamwidth  were  6°  and  5°,  respectively.  There  were 
30,000  complex  (i.e.  coherent)  samples  collected  in  each 
range  cell,  for  each  wind  speed  data  set. 

3.3.  The  Dawber  sea  clutter  data  sets 

These  data  sets  were  collected  during  experiments  conduc¬ 
ted  by  DERA  in  January  and  February  of  1994,  1995,  and 
1996,  at  Sennen  Cove  near  Lands  End,  and  also  at  Ports¬ 
mouth  (looking  at  the  Isle  of  Wight)  in  December  1996. 
The  radar  used  was  the  multi-band  pulsed  radar  (MPR) 
designed  and  built  by  Roke  Manor.  Two  data  sets  from 
the  experiments  mentioned  above  were  made  available  for 
the  work  reported  in  this  summary.  Both  of  these  data 
sets  were  collected  without  the  use  of  pulse  compression 
or  polarisation  agility.  For  both  data  sets  the  radar  range 
resolution  was  150m  (i.e.  a  pulse  width  of  l^s),  and  the 
PRF  was  20kHz.  The  first  data  set,  which  will  be  called 
the  Dawber- VV  data  set,  was  collected  using  vertical  po¬ 
larisation  on  transmit  and  receive  during  a  wind  speed  of 
12.8ms-1.  The  second  data  set,  which  will  be  called  the 
Dawber-HH  data  set,  was  collected  during  a  wind  speed  of 
15.4ms-1.  The  grazing  angle  and  beamwidth  used  in  the 
collection  of  both  data  sets  were  0.12°  and  6°,  respectively. 
There  were  25,600  complex  samples  collected  in  each  data 
set:  these  samples  correspond  to  the  temporal  signal  collec¬ 
ted  in  one  range  cell,  at  a  distance  of  4km  from  the  radar. 

4.  BROOMHEAD’S  FILTER  METHOD 
4.1.  Diagram  and  explanation 

A  diagram  of  the  filtering  method  proposed  by  Broomhead 
et  al.  [1]  is  given  in  Figure  1.  A  block  diagram  of  Broom- 
head’s  filtering  method  is  shown  in  Figure  1(a),  with  a  spec¬ 
tral  representation  of  the  filtering  operations  employed  in 
this  filtering  technique  shown  in  Figure  1(b).  The  input  sig¬ 
nal  into  the  Broomhead  filter  {&(«)}  is  a  linear  combination 
of  a  narrowband  signal  of  interest  {<(«)}  and  a  wideband 
noise  process  {x(n)},  i.e.  b(n )  =  t(n)  +  x(n).  It  is  assumed 
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Figure  1:  Broomhead ’s  filter  method,  (a)  block  diagram,  (b) 
spectral  representation  of  the  noise  reconstruction  process. 


that  the  spectral  properties  (i.e.  the  band  limits)  of  the 
signal  of  interest  are  known.  In  a  radar  context,  the  noise 
process  could  be  sea  clutter,  and  the  signal  of  interest  could 
be  reflections  from  a  ship  or  iceberg.  A  bandstop  linear  filter 
is  used  to  remove  the  signal  of  interest  from  the  input  signal 
b(n).  The  nonlinear  inverse  network  is  used  to  reconstruct 
the  noise  process  {x(n)J  from  the  output  of  the  bandstop 
filter  {fcj?(u)}.  The  reconstructed  noise  process  {xjr(7i)}  at 
the  output  of  the  nonlinear  inverse  can  then  be  subtrac¬ 
ted  from  the  input  signal  {!>(«)}  to  obtain  an  estimate  of 
the  signal  of  interest  {t(n)}.  Figure  1(b)  depicts  a  case 
where  the  bandstop  filter  is  orthogonal  to  (i.e.  completely 
removes)  the  signal  of  interest,  and  where  the  nonlinear  in¬ 
verse  manages  to  reconstruct,  perfectly,  the  noise  process, 
so  that  the  signal  of  interest  may  be  obtained,  exactly,  with 
no  errors. 

4.2.  Linear  bandstop  filter 

The  filtering  method  depicted  in  Figure  1,  was  originally 
intended  for  application  to  chaotic  noise  cancellation  [1]. 
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In  such  an  application  the  correct  choice  of  bandstop  filter 
to  use  is  very  important.  Essentially,  the  filter  must  remove 
the  signal  of  interest  (ideally  completely),  and  it  must  also 
preserve  the  dynamics  of  the  noise,  so  that  the  nonlinear 
inverse  can  properly  reconstruct  the  noise.  A  discussion 
on  the  selection  of  an  appropriate  bandstop  filter  for  this 
application  is  given  in  [1,4],  however  the  key  points  of  this 
discussion  are  now  briefly  summarised: 

•  Using  a  short  enough  linear  FIR  filter  can  preserve  the 
dynamics  of  a  chaotic  signal. 

•  If  the  order  of  a  linear  FIR  filter  is  too  high,  the  dy¬ 
namics  of  a  chaotic  signal  can  be  changed. 

•  The  higher  the  order  of  a  bandstop  FIR  filter,  the 
greater  the  attenuation  is  in  the  stopband. 

•  An  infinite  impulse  response  (HR)  filter  changes  the 
dynamics  of  a  chaotic  signal,  as  it  has  its  own  associ¬ 
ated  dynamics. 

Clearly,  to  preserve  the  dynamics  of  a  chaotic  signal,  a  low 
order  FIR  would  be  preferred.  However,  there  is  a  tradeoff 
between  signal  suppression,  and  dynamics  distortion:  a 
short  enough  filter  may  not  change  the  dynamics  of  a 
chaotic  process,  but  it  may  also  not  adequately  suppresses 
the  signal  of  interest.  There  is  ambiguity  associated  with 
what  length  of  filter  constitutes  one  that  is  short  enough, 
i.e.  of  low  enough  order  to  not  change  the  dynamics  of  a 
chaotic  process.  There  appears  to  be  no  clear  cut  method 
for  selecting  an  appropriate  order,  other  than  to  try  a 
simple  trial  and  error  approach,  to  find  a  suitable  filter 
length  which  not  only  adequately  suppress  the  signal  of 
interest,  but  also  does  not  change  the  dynamics  of  the  noise 
process.  It  should  be  pointed  out  that  filtering  of  any  kind 
will  distort  the  dynamics  of  a  chaotic  process,  however, 
the  aim  is  to  limit  this  distortion  as  much  as  possible 
so  that  the  nonlinear  inverse  can  produce  a  reasonable 
reconstruction  of  the  original  chaotic  noise  process  {z(n)} 
from  the  bandstop  filtered  chaotic  process 

For  maritime  surveillance  radar  [6],  the  noise 1  process  is 
sea  clutter,  which  is  the  term  for  radar  returns  from  the 
sea  surface.  Haykin  and  Puthusserypady  [7]  have  presen¬ 
ted  evidence  to  suggest  that  sea  clutter  is  a  chaotic  pro¬ 
cess.  However,  evidence  presented  in  [8]  has  suggested  that 
each  clutter  data  set  described  in  section  2  can  be  modelled 
as  a  non-Gaussian  stochastic  process,  which  is  the  accep¬ 
ted  type  of  model  for  high  resolution  and/or  low  grazing 
angle  sea  clutter  returns  [9].  As  will  be  discussed  in  sec¬ 
tion  3.3,  the  application  of  Broomhead’s  filter  method  to 
non-Gaussian  noise  processes,  rather  than  just  limiting  its 
application  to  situations  where  the  noise  process  is  chaotic, 
is  perfectly  justifiable.  In  applying  Broomhead’s  filter  tech¬ 
nique  to  cases  where  the  noise  process  is  not  chaotic,  but  is 
instead  a  non-Gaussian  stochastic  process,  it  might  be  reas¬ 
onable  to  assume  that  a  discussion  of  the  preservation  of  a 
signal’s  dynamics  is  irrelevant  to  the  choice  of  a  suitable 
bandstop  filter.  However,  the  distortion  of  a  chaotic  sig¬ 
nal’s  dynamics  resulting  from  (linear)  filtering  can  be  seen 
as  a  more  general  change  in  the  nonlinear  properties  of  the 
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chaotic  signal.  Furthermore,  it  is  suggested  that  this  idea 
of  distorting  the  nonlinear  properties  of  a  signal  by  filtering 
can  be  extended  to  any  non-chaotic  noise  process,  which 
has  nonlinear  properties  that  would  allow  Broomhead’s  fil¬ 
ter  method  to  perform  better  than  a  conventional  linear  ap¬ 
proach.  In  other  words,  it  may  be  necessary  to  exercise  the 
same  caution  in  the  selection  of  a  suitable  bandstop  filter 
for  the  application  of  Broomhead’s  filtering  technique  when 
the  noise  process  is  described  as  a  non-Gaussian  stochastic 
process,  as  is  required  when  the  noise  process  is  chaotic. 

4.3.  Why  a  nonlinear  inverse  as  opposed  to  a  linear 
inverse? 

Linear  filtering  techniques  are  unable  to  relate  noise 
components  outside  the  band  of  interest,  with  those 
inside  the  band,  if  the  noise  process  is  not  available 
both  during  training  of  the  linear  inverse,  and  also  after 
training.  In  a  situation  where  the  noise  process  is  only 
available  during  training2 ,  the  best  a  linear  inverse  noise 
suppression  approach  can  achieve  is  to  remove  all  of  the 
out-of-band  noise.  It  still  leaves  behind  the  in-band  noise, 
and  therefore  performs  sub-optimally.  The  interest  in 
using  a  nonlinear  approach  is  to  try  and  identify  a  suitable 
nonlinear  relationship  that  would  allow  both  the  in-band 
and  out-of-band  noise  components  to  be  suppressed, 
allowing  a  nonlinear  approach  to  perform  better  than  a 
linear  one. 

As  already  mentioned  in  section  3.2,  it  is  justifiable  to  ap¬ 
ply  Broomhead’s  filtering  technique  to  the  broad  class  of 
non-Gaussian  signals.  The  reason  for  this  is  now  given.  If  a 
process  may  be  described  as  a  Gaussian  stochastic  process 
(correlated  or  uncorrelated),  then  all  its  frequency  compon¬ 
ents  are  independent,  and  no  part  of  its  spectrum  is  related 
to  another  part  [10],  and  it  is  therefore  impossible  to  re¬ 
late  out-of-band  noise  to  in-band  noise,  when  the  noise  pro¬ 
cess  is  not  known  beforehand.  However,  for  non-Gaussian 
stochastic  signals  it  may  be  possible  that  a  nonlinear  ap¬ 
proach  could  relate  out  of  band  noise  to  in-band  noise, 
and  could  therefore  be  used  to  eliminate  in-band  noise,  and 
achieve  better  noise  suppression  than  a  linear  approach. 


5.  CANCELLATION  OF  BROADBAND 
CHAOTIC  NOISE  FROM  A  SINUSOID 


Ikeda  map  noise  was  added  to  a  sinusoid  so  that  the  signal 
to  noise  ratio  (SNR)  was  -2.7dB  where, 


SNR  =  10log10 


&  signal 

(T2  • 

.  notse 


(i) 


and  Signal  is  the  variance  of  the  signal  of  interest,  and 
Vnoise  is  the  variance  of  the  noise.  As  a  performance  bench¬ 
mark  an  18th  order  bandpass  Butterworth  filter  was  used  on 
the  noise  corrupted  sinusoid,  and  the  output  SNR  achieved 
was  24.5dB.  Broomhead’s  filter  method  was  applied  with 


2  This  is  the  case  in  maritime  surveillance  radar,  where  it  is 
assumed  sea  clutter  data  can  be  collected  without  any  target 
signal  present:  for  instance,  the  absence  of  a  target  signal  could 
be  ensured  by  visually  inspecting  an  area  close  to  the  radar. 
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and  also  without  the  signal  of  interest  present  during  train¬ 
ing  of  the  nonlinear  inverse,  using  a  normalised  radial  basis 
function  network  (NRBFN)  [11]  inverse  (with  Gaussian  ker¬ 
nel  functions),  and  the  following  bandstop  filters:  a  notch 
filter  [12],  an  HR  filter,  an  FIR  filter  with  25  taps,  and  an 
FIR  filter  with  193  taps.  It  was  found  that  Broomhead’s 
nonlinear  inverse  filter  method  and  the  linear  inverse  com¬ 
parison  performed  more  poorly  in  terms  of  output  SNR 
than  the  Butterworth  filter  when  the  FIR  and  HR  filters 
were  used  as  the  bandstop  filter.  Furthermore,  the  non¬ 
linear  and  linear  inverse  techniques  also  performed  more 
poorly  than  the  Butterworth  filter  when  the  signal  of  in¬ 
terest  was  present  during  training  and  the  notch  filter  was 
used  as  the  bandstop  filter.  However,  the  nonlinear  inverse 
outperformed  both  the  linear  inverse  and  the  Butterworth 
filter  when  the  notch  filter  was  used  as  the  bandstop  filter 
and  the  signal  of  interest  was  not  present  during  training 
(i.e.  the  noise  alone  was  used  to  train  the  inverse),  see  Fig¬ 
ure  2.  These  results  confirm  those  obtained  by  Broomhead 
et  al.,  and  contradict  those  obtained  by  Strauch. 


6.  MODIFIED  BROOMHEAD  FILTER 
METHOD 

6.1.  Novel  bandstop  filtering  approach 

The  results  discussed  in  section  4  suggest  that  the  only 
linear  bandstop  filtering  method  which  does  not  distort 
the  dynamics  (or  nonlinear  properties)  of  the  chaotic  data 
too  much,  and  which  therefore  allows  Broomhead’s  filter 
method  to  perform  better  than  linear  alternatives,  is  the 
notch  filter.  Therefore,  the  novel  and  unorthodox  ap¬ 
proach  for  the  bandstop  filtering  aspect  of  Broomhead’s 
filter  method  was  to  use  a  series  of  notch  filters  in  order 
to  allow  Broomhead’s  filter  method  to  be  applied  when  the 
signal  of  interest  has  a  broader  spectrum  than  that  of  a 
sine  wave.  This  modified  Broomhead  filter  approach  was 
applied  to  the  cancellation  of  Ikeda  noise  from  narrowband 
Gaussian  signals,  and  to  the  cancellation  of  sea  clutter  from 
narrowband  Gaussian  signals. 


6.2.  Canellation  of  Ikeda  noise  from  a  narrowband 
Gaussian  signal 

White  Gaussian  noise  was  passed  through  a  6th  order  IIR 
Butterworth  bandpass  filter  with  a  passband  from  norm¬ 
alised  frequency  ///„  = 0.26  to  ///„= 0.285,  to  produce  a 
narrowband  Gaussian  signal.  Ikeda  noise  was  added  to  this 
signal,  and  the  resulting  SNR  was  -8.5dB.  As  a  benchmark 
performance  measure,  a  6th  order  Butterworth  bandpass 
filter  with  a  passband  of  f/f,= 0.258  to  f / fs= 0.287  was 
used,  and  it  achieved  an  output  SNR  of  -ldB,  A  NRBFN 
with  an  embedding  dimension  of  4  and  an  embedding  delay 
of  1  sample  was  used  as  the  nonlinear  inverse  in  Broom¬ 
head’s  filter.  The  bandstop  linear  filter,  used  to  cancel  the 
Gaussian  signal,  comprised  of  3  notch  filters  (in  cascade) 
with  notches  at  f/f3= 0.265,  0.275,  and  0.285.  Nonlinear 
and  linear  inverse  results  are  shown  in  Figure  3.  As  can  be 
seen  from  Figure  3,  the  nonlinear  inverse  method  was  found 
to  achieve  a  better  output  SNR  than  both  the  linear  inverse 
and  the  bandpass  Butterworth  filter. 
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Figure  2:  Testing  and  validation  data  set  output  SNR’s 
for  (a)  nonlinear  inverse,  (b)  linear  inverse  using  a  notch 
filter  as  the  bandstop  filter.  Inverses  trained  on  noise  alone. 
Training,  testing  and  validation  data  sets  of  length  2500 
samples  were  used.  An  embedding  dimension  of  4  and  an 
embedding  delay  of  1  sample  were  used  by  the  nonlinear 
inverse. 


6.3.  Cancellation  of  sea  clutter  from  a  narrowband 
Gaussian  signal 

White  Gaussian  noise  was  passed  through  a  6th  order  IIR 
Butterworth  bandpass  filter  with  a  passband  from  normal¬ 
ised  frequency  f/f,= 0.0  to  ///s=0.025,  to  produce  a  nar¬ 
rowband  Gaussian  signal.  The  wavet.ank  12ms-1  gate  14 
amplitude  data  set  was  added  to  this  signal,  and  the  result¬ 
ing  signal  to  clutter  ratio  (SCR)  was-2.7dB.  A  NRBFN  was 
used  as  the  nonlinear  inverse  in  Broomhead’s  filter  method. 
Embedding  dimensions  of  4  to  20  in  steps  of  1  were  used. 
For  each  embedding  dimension,  the  number  of  kernels  was 
varied  from  100  to  800  in  steps  of  100.  The  training  length 
for  each  simulation  was  2500  samples.  The  bandstop  lin¬ 
ear  filter,  used  to  cancel  the  Gaussian  signal,  comprised 
of  3  notch  filters  (in  cascade)  with  notches  at  ///,=0.005, 
0.015,  and  0.025.  A  10  tap  linear  inverse  comparison  was 
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(a) 


(b) 


Figure  3:  Testing  and  validation  data  set  output  SNR’s  for 
(a)  nonlinear  inverse  and  (b)  linear  inverse  cancellation  of 
Ikeda  noise  from  a  narrowband  Gaussian  signal.  Inverses 
trained  on  noise  alone.  Training,  testing  and  validation 
data  sets  of  length  2500  samples  were  used. 

used  with  a  training  length  of  2500  samples.  The  train¬ 
ing  of  the  nonlinear  and  linear  inverses  was  done,  for  all 
simulations,  using  clutter  only  data  (i.e.  not  the  clutter 
corrupted  signal  of  interest).  Testing  data  set  results  for 
the  linear  and  nonlinear  inverses  are  given  in  Figure  4.  As 
can  be  seen  from  Figure  4,  the  simple  10  tap  linear  inverse 
performed  as  well  as,  or  better  than,  the  nonlinear  inverses. 
This  was  determined  to  be  the  case  for  all  the  DERA  clutter 
data  sets  [8], 

7.  SUMMARY 

Broomhead’s  nonlinear  inverse  filter  method  was  shown  to 
outperform  linear  alternatives  for  the  cancellation  of  Ikeda 
noise  from  a  sine  wave  when  the  bandstop  filter  used  was 
a  notch  filter,  and  the  inverse  was  trained  on  noise  only 
data.  A  modified  Broomhead  filter  was  proposed  which 
allowed  Broomhead’s  filter  method  to  be  used  to  cancel 
noise  from  signals  with  a  wider  spectrum  than  that  of  a  sine 


Figure  4:  Testing  data  set  output  SCR’s  for  nonlinear 
inverse  and  linear  inverse  cancellation  of  the  wavetank 
12ms-1  gate  14  amplitude  data  set  from  a  narrowband 
Gaussian  signal.  Inverses  trained  on  clutter  alone. 

wave.  The  modified  Broomhead  filter  method  was  shown 
to  achieve  better  results  than  linear  alternatives  in  the  case 
of  cancellation  of  chaotic  Ikeda  noise  from  a  narrowband 
Gaussian  signal,  but  not  in  the  case  of  cancellation  of  DERA 
sea  clutter  from  a  narrowband  Gaussian  signal. 
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ABSTRACT 

This  paper  describes  a  digital  beamforming  architecture 
for  nulling  a  mainlobe  jammer  and  multiple  sidelobe 
jammers  while  maintaining  the  monopulse  angle 
estimation  accuracy.  It  involves  two-stage  processing 
using  adaptive  digital  beamforming  followed  by  a 
mainlobe  jammer  canceller.  A  mainlobe  jammer  blocking 
matrix  and  constrained  adaptation  are  employed  during 
the  adaptive  sideloble  cancellation  so  that  the  results  of 
sidelobe  jammer  cancellation  process  do  not  distort 
subsequent  mainlobe  cancellation  process.  This  technique 
is  developed  to  determine  the  angular  location  of  a  target 
by  maintaining  the  estimation  accuracy  of  the  monopulse 
ratio  in  the  presence  of  jamming. 

1.  INTRODUCTION 

Monopulse  is  a  radar  technique  in  which  the  angular 
location  of  a  target  can  be  determined  within  fractions  of  a 
beamwidth  by  comparisons  of  two  or  more  simultaneous 
beams  [1],  Monopulse  technique  for  angle  estimation  fails 
when  there  is  sidelobe  jamming  (SLJ)  and/or  mainlobe 
jamming  (MLJ).  If  not  effectively  encountered,  electronic 
jamming  prevents  successful  radar  target  detection  and 
tracking.  We  have  developed  an  adaptive  beamforming 
architecture  and  a  signal  processing  algorithm  to  cancel 
mainlobe  and  sidelobe  jammers  while  maintaining  target 
detection  and  angle  estimation  accuracy  on  mainlobe 
targets  [2],  Our  technique  makes  use  of  a  cascaded 
scheme  where  sidelobe  jammers  are  cancelled  using 
adaptive  array  followed  by  mainlobe  canceller. 

In  order  to  motivate  this  technique,  we  first  review  some 
antenna  architectures  and  adaptive  processing  schemes  for 
jammer  cancellation.  Specifically,  fully  adaptive  array  [3, 
4]  and  sum-difference  mainlobe  canceller  (MLC)[5]  are 
discussed  including  their  performance  in  target  angle 
estimation  in  jamming. 

Adaptive  array  for  target  detection  and  angle  estimation  in 


jamming  leads  to  the  adaptive  sum  and  difference  beams. 
The  sum  and  difference  beams  are  formed  by  adaptive 
receiving  array  techniques  that  automatically  null  the 
interference  sources.  Because  of  the  adaptation,  the  two 
antenna  patterns  vary  with  the  external  noise  field  are 
distorted  relative  to  the  conventional  sum  and  difference 
beams  that  possess  even  and  odd  symmetry,  respectively 
about  a  prescribed  boresight  angle.  This  technique 
cancels  both  mainlobe  and  sidelobe  jammers  but  distort 
monopulse  ratio,  thus  leading  to  the  bias  in  angle 
estimation  especially  for  the  situation  when  jammers  are 
within  the  mainbeam.  This  approach  for  angle  estimation 
works  well  when  the  jammers  are  in  the  sidelobe. 

Applebaum  et  al  [5]  have  developed  a  beamforming 
architecture  and  algorithm  for  nulling  the  MLJ  while 
preserving  the  monopulse  ratio.  This  technique  makes  use 
of  the  idea  that  the  patterns  are  separable  in  azimuth  and 
elevation,  i.e.  the  patterns  can  be  expressed  as  products  of 
sum  and  difference  factors  in  azimuth  and  elevation.  We 
can  therefore  cancel  jammers  with  nulls  along  one 
direction  while  keeping  the  non-adapted  sum  and 
difference  patterns  along  another  direction,  thus  yielding 
adapted  sum  and  difference  beams  with  an  undistorted 
monopulse  ratio.  This  technique  does  not  cancel  sidelobe 
jammers. 

As  the  adaptive  array  works  well  for  sidelobe  jammer,  and 
MLC  works  well  for  MLJ  cancellation,  we  are  motivated 
to  combine  these  techniques  for  adaptive  monopulse 
processing.  Specifically,  we  have  devised  a  scheme  such 
that  the  adaptive  array  is  used  for  sidelobe  cancellation 
and  the  MLC  is  used  for  mainlobe  cancellation.  The 
technique  is  developed  in  section  2  and  analytical 
performance  evaluation  is  used  to  illustrate  the  technique 
in  section  3. 
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2.  COMBINING  ADAPTIVE  ARRAY  AND 
MAINLOBE  CANCELLER 

In  this  section,  we  present  an  approach  which  involves  an 
adaptive  digital  beamforming  (DBF)  sub-array  followed 
by  an  MLC.  In  order  to  combine  the  cancellation 
technique,  we  include  an  appropriate  mainlobe 
maintenance  scheme  or  impose  some  main-beam 
constraints  in  the  SLJ  canceling  process.  In  this  manner, 
identical  nulls  are  formed  at  both  the  sum  and  difference 
beams,  with  the  main  beams  maintained  appropriately 
before  applying  the  MLC. 

For  the  two-stage  DBF  architecture  considered  here,  there 
are  N  columns  in  the  DBF  array,  and  each  column  has  M 
elemental  sensors.  Partial  adaptivity  is  employed  where 
fixed  beamforming  is  used  for  each  column  and  adaptive 
degrees-of-freedom  are  available  along  azimuth.  In  this 
set  up,  input  for  each  column  is  linearly  combined  to 
form  the  column  sum  and  difference  beams, 
i.e{r£  (i)>rA'(i)},  i=l,...N.  The  two  sets  of  beams  can 

then  be  digitized  and  linearly  sum  to  form  the  array  sum 
beam,  delta-azimuth  beam,  delta-elevation  beam  and 
delta-delta  beam.  It  should  be  noted  that  the  quiescent 
patterns  are  separable  with  the  following  expressions: 


fE(Tx,Ty)  = 


g*E(Tx<Ty) 

gz(Tx’Ty) 


SAjTy) 

SS'Vy) 


(3) 


These  derivations  make  use  of  the  separable  property  of 
the  planar  array  patterns  as  given  before.  In  the  presence 
of  jamming,  the  column  sum  and  difference  inputs  are 
adaptively  weighted  as  follows: 


where 
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where  and  WA  are  the  nominal  sum  and  difference 

beamforming  weights,  and  rZf  and  r  are  the  column 
sum  and  difference  beamforming  inputs  given  by  : 


g&E  (Tx ,Ty)  =  gZa  (Tx )g&'  (Ty ) 


£a4  (TxJy)  =  gK  (Tx )gAt  (Ty ) 


%(N-i)  I 


where  Tx  and  Ty  are  the  steering  directions  given  by  • 

Tx  =  cos(£7)sin(Az) 

Ty  =sin (El) 

where  Az  and  El  are  the  azimuth  and  elevation  angle 
correspondingly.  Monopulse  ratio  along  azimuth  or 
elevation  direction  can  then  be  formed  giving  azimuth  and 
elevation  DOA  estimates  by  using  the  following: 

fA(TXJy)=g^X'Ty)~ 

S*(W  (2) 

ZlSTx) 

S*STx) 


>a.(0) 


—A  = 


K(w-i)J 

The  sample  matrix  inverses  (i.e.  and  ) 

modifies  the  quiescent  weights  and  corresponds  to  a 
nulling  preprocessing  responsive  to  jammers.  It  is 
essential  to  include  an  appropriate  mainlobe  maintenance 
technique  or  to  include  some  constraints  in  the  adaptive 
process. 

After  the  first  stage  of  adaptive  processing,  the  beams  are 
free  of  SLJs  ,  but  may  include  the  MLJ.  The  main  beams 
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can  then  be  canceled  using  an  MLC.  For  example,  in 
order  to  form  the  monopulse  ratio  in  elevation,  we  can 
adapt  the  sum  and  difference  beams  to  cancel  the  MLJ 
simultaneously  as  follows  : 


%=rz-warAA  (14) 

%  =rAE-warAi  (15) 


This  can  be  done  by  adapting  wa  in  the  sum  channel  and 
using  it  in  the  difference  channel  or  choosing  the  weight  to 
adapt  the  sum  and  difference  beams  simultaneously  by 
minimizing  the  sum  of  output  power  for  both  beams.  In 
this  way,  the  monopulse  ratio  using  the  adapted  sum  and 
difference  patterns  (i.e.  (Ts ),  §Ae  (Ts  ) )  can  be  shown 

to  be  preserved  along  the  elevation  axis  while  the  jammer 
is  nulled  along  the  azimuth  axis  as  follows: 


fEVs)=-. 


h  ,(ts) 
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The  same  technique  can  also  be  used  to  preserve  the 
monopulse  ratio  along  the  azimuth  with  the  mainlobe 
jammer  canceled  along  the  elevation. 


3.  ANAYLTICAL  PERFORMANCE  EVALUATION 


In  this  section,  we  describe  the  performance  of  our 
technique  on  monopulse  angle  estimation  in  jamming 
using  analytical  evaluation.  Specifically,  we  concentrate 
on  the  approach  of  combining  the  adaptive  array  and  the 
MLC.  Our  analytical  performance  evaluation  makes  use 
of  a  DBF  planar  array.  In  this  example,  the  planar  array 
has  28  columns,  and  each  column  has  14  elemental 
sensors,  placed  half  a  wavelength  apart.  Fixed  analog 
beamforming  is  performed  along  the  elevation  for  each 
column,  and  adaptive  digital  beamforming  capability 
along  the  azimuth  is  available  on  the  resulting  column 


beams.  Uniform  nominal  weights  are  used.  The  null-to- 
null  beam- width  is  8°  along  the  azimuth  and  16°  along  the 
elevation.  There  are  two  jammers:  one  jammer  is  located 
within  the  main-beam  (2°  azimuth  and  3°  elevation),  and 
the  other  jammer  is  located  at  the  sidelobe  (10°  azimuth 
and  2°  elevation).  Both  jammers  have  a  jamming-to-noise 
ratio  (JNR)  of  45  dB  on  the  element,  and  a  signal-to- 
noise  ratio  of  0  dB  on  the  element.  We  first  evaluated  the 
quiescent  antenna  pattems(Figure  1(a)  and  (b).  The 
performance  of  using  adaptive  array  and  MLC  with 
mainlobe  maintenance  is  illustrated  in  Figure  2.  These 
results  show  that  the  adaptive  monopulse  technique  is 
capable  of  canceling  mainlobe  and  multiple  sidelobe 
jamming  while  preserving  the  radar’s  ability  to  estimate 
the  target  angle  accurately. 
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Figure  1(a)  Quiescent  sum  beam  pattern 


adapted  azimuth  sum  pattern 


Figure  2(b)  Adapted  azimuth  delta  beam  pattern 
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Figure  1(b)  Quiescent  azimuth  delta  pattern 
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Figure  2(c)  Azimuth  angle  bias 
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ABSTRACT 

The  SMF  algorithm  is  designed  for  the  estimation  of 
the  coefficients  of  a  constant  amplitude  polynomial 
phase  signal.  It  relies  on  shift  invariant  signal  moments 
with  lower  orders  than  the  generalized  ambiguity  func¬ 
tion  (GAF)  and  it  does  not  require  maximization.  The 
major  contribution  of  the  communication  is  the  deriva¬ 
tion  of  an  analytic  expression  of  the  SMF  error  variance 
for  high  signal  to  noise  ratios.  This  result  proves  the 
asymptotic  efficiency  of  SMF  when  a  dependency  be¬ 
tween  the  number  of  moments  and  the  number  of  sam¬ 
ples  is  introduced.  Moreover,  it  underscores  the  superi¬ 
ority  of  SMF  on  GAF  with  an  appropriate  choice  of  the 
number  of  moments.  Finally,  the  optimal  parameters 
for  order  3  and  4  polynomial  phase  signal  estimation 
as  a  function  of  the  signal  length  are  provided. 

1.  INTRODUCTION 

This  communication  is  devoted  to  the  estimation  of 
the  parameters  of  a  noisy  polynomial  phase  constant 
amplitude  signal.  This  model  is  sufficiently  general  to 
represent  a  broad  category  of  real  life  signals,  the  reader 
can  refer  to  [1]  for  a  list  of  applications  relying  on  this 
type  of  signal. 

Parametric  analysis  of  polynomial  phase  signal  gave 
rise  to  an  increasing  interest  during  last  years,  [2,  7,  4]. 
The  solution  generally  retained  to  solve  this  problem  is 
the  Generalized  Ambiguity  Function  (GAF),  [oj.  The 
first  stage  of  this  method  consists  in  the  transformation 
of  the  signal  in  a  pure  tone.  This  is  achieved  iteratively 
by  successive  phase  differentiations:  at  each  iteration, 
the  differentiation  is  achieved  multiplying  the  sample 
at  instant  n  by  the  conjugated  sample  at  instant  n  —  r. 
In  the  noisy  case,  the  higher  degree  phase  coefficient 
is  estimated  from  the  global  maximizer  of  the  trans¬ 
formed  signal  periodogram.  Besides  its  simplicity,  this 
estimator  has  the  advantage  of  being  asymptotically 
efficient. 


In  spite  of  these  advantages,  this  approach  has  sev¬ 
eral  limitations.  The  first  is  that  the  signal  transforma¬ 
tion  involves  the  product  of  2M_1  signal  terms  where 
M  is  the  degree  of  the  phase.  When  M  is  high,  the  ef¬ 
fect  of  this  large  number  of  terms  will  be  a  fast  degra¬ 
dation  of  the  algorithm  performances.  The  second  is 
that  the  method  requires  a  computationally  expensive 
global  maximization. 

In  order  to  overcome  these  disadvantages,  the  SMF 
algorithm  (Stationary  Moments  Fitting)  has  been  pro¬ 
posed,  [3].  The  principle  of  this  method  relies  on  the 
fact  that,  although  the  signal  is  clearly  non-stationary, 
some  of  its  moments  are  time  shift  invariant.  This 
property  is  used  to  recursively  estimate  the  phase  pa¬ 
rameters.  In  [3],  the  performances  of  the  algorithm  are 
studied  using  Monte-Carlo  simulations.  These  simu¬ 
lations  have  shown  that  when  the  number  of  data  is 
small,  SMF  performances  are  higher  than  GAF,  [5]. 
The  aim  of  this  communication  is  to  propose  an  accu¬ 
rate  analysis  of  SMF  performances,  specifically  thru  a 
statistical  analysis  of  its  precision. 

Next  section  briefly  presents  the  SMF  algorithm.  An 
analytic  expression  of  the  higher  phase  coefficient  vari¬ 
ance  is  given  in  the  next  section.  In  the  third  section 
this  result  is  first  validated  and  then  used  to  establish 
the  asymptotic  efficiency  of  the  estimator.  Finally,  this 
expression  is  used  to  obtain  an  optimal  selection  of  the 
algorithm  parameters. 

2.  THE  SMF  ALGORITHM 

We  assume  that  yn  is  an  order  M  noisy  polynomial 
phase  signal: 

M 

yn  =  Aexp{j<f>n}  +  u>n  =  Aexp{j  ^  aqnq}  +  wn, 

q=0 

(i) 
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where  wn  is  a  white  and  circular  iid  complex  Gaussian 
noise  wn  ~  Jfc( 0,  and  Oo  ~  £/(0, 2n). 

We  have  demonstrated  in  [3]  that  only  moments  of 
order  higher  or  equal  to  2 M  can  be  time  shift  invariant. 
Moreover  these  statistics  only  allow  the  identification 
of  am-  Consequently  we  propose  to  estimate  am  from 
the  order  2M  stationary  moments  and  similarly  as  [5] 
estimate  recursively  the  other  parameters.  Let  Cm  and 
C*M  be  the  two  disjoint  sets  containing  the  M  delays 
associated  to  the  unconjugated  and  conjugated  signal 
samples1.  The  corresponding  order  2 M  “stationary” 
moment  of  y  is: 


M 


2M,y 


(Cm,C*m)=  E 


=  A2M  exp{(— 1)mj‘Mam  II 

{fc|efc= — 1} 


Proposition  1  For  SNR  =  A2/al  »  1, 
“r{a5r}  - 


(Er=i  r2My 


L  M 

Er 

__w^rpUr 


(5) 


where  ur  is  a  1  x  N  vector  whose  non  zero  components 
are: 

•  ur(fc)  =  Ya= i  e‘n‘  for  q  =  1, ...  ,P  ~  1  and  1  + 
rlq  <  k  <  rlq+ 1 . 

•  ur(k)  =  Ef=,+i  for  q=l,...  ,p-l  and  N  - 
r(lp  —  lq )  k  <  N  r{lp  lq~ j-i). 

Proof:  See  appendix  1. 


where  lk  <  <  •  •  •  <  lp  are  the  p  different  delays,  n* 

their  multiplicity  order  and  e*  =  ±1  indicates  if  yn+ik 
is  conjugated  or  not.  With  these  notations,  we  have: 

p 

^2eknk=  0.  (2) 

k= 1 


•  It  is  worthy  to  note  that  the  vector  ur  has  only  2 rlp 
non  zero  terms.  The  computation  of  the  norm  oc- 
curing  in  the  previous  expression  does  not  involve 
a  sum  of  N  terms  but  of  only  2 Llp  terms. 

•  Consider  a  cubic  phase  signal;  M  =  3.  A  solution 
is  the  choice  of  the  order  6  germ: 


Noticing  that  ( rCM,rC*M ),  r  =  1,...  ,L  also  lead 
to  stationnary  moments,  am  can  be  estimated  by  least 
squares  from  the  angles  of  these  L  moments: 


zSMF  _ 
A  M  — 


Ef=irMangleK(r)) 
(-l)MMnWo,=-i}CEtir 


2  M  ’ 


(3) 


with: 


M6,j/({0, 3,3},{1, 1,4})  = 

E  {ynyl-aivl-iVn-i)*} ■  (6) 

Consequently  p  =  4  and: 


0 

II 

"■■a 

H 

II 

<N 

h  =3 

II 

fi 

n2—  2 

n3  =2 

l—l 

II 

VO 

to 

II 

1 

1— * 

£3  =  1 

rhy{r))  =  M2  M,y(rCM,rC*M) 

N—rlp  p 

=  jWT7T  d  n»Kv 

^  nP  k= 1  9=1 


(4) 


The  set  (Cm,  £*m)  will  be  denoted  as  the  germ  of  SMF. 


The  non  zero  components  of  ur  are: 


1  <  k  <  r  +  1 

3r  +  l<fc<4r  +  l 
N  —  Ar  <k  <N  —  3r 
N  —  3r  <  k  <  N  —  r 
N  —  r  <  k  <  N 


ur(fc)  =  1 
ur(fc)  =  — 1 

U  r(k)  =  1 

VLr(k)  =  — 1 
U  r(k)  =  1 

ur(A:)  =  — 1 


3.  PERFORMANCE  ANALYSIS 
3.1.  Statistical  analysis 

The  next  proposition,  which  is  the  major  contribution 
of  this  communication,  gives  an  expression  of  a m  vari¬ 
ance. 

1Tables  containing  these  delays  for  various  M  are  given  in  [3]. 


4.  SIMULATIONS 

4.1.  Comparison  with  Monte-Carlo  simulations 

The  aim  of  this  first  simulation  is  to  compare  the  the¬ 
oretical  expression  of  the  variance  given  in  the  previ¬ 
ous  section  with  the  variance  estimated  by  Monte-Carlo 
simulations.  The  simulations  have  been  realised  with 
a  cubic  phase  signal  and  the  germ  ({0, 3,3},  {1, 1,4}). 
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Figure  1:  Comparison  between  theoretical  variance  and 
Monte-Carlo  simulations.  M  =  3,  the  germ  equals 
({0, 3,3},  {1,1,4}),  JV  =  20  and  L  =  3. 


Figure  2:  Asymptotic  efficiency  of  SMF.  M  —  3  and 
the  germ  equals  ({0, 4, 5},  {1, 2, 6}). 


The  number  of  samples  is  TV  =  20  and  L  =  3.  The 
variances  have  been  estimated  from  500  independent 
realisation  of  the  noise  for  each  signal  to  noise  ratio. 
Figure  1  shows  the  good  adequation  between  the  esti¬ 
mated  variances  and  the  expression  (5).  The  Cramer 
Rao  lower  bound  (CRLB)  of  03  given  by  [6]  is  also  given 
in  the  plot. 

4.2.  Asymptotic  performances 


is  defined  for  SNR  1.  Note  that  this  ratio  is  not 
function  of  the  signal  to  noise  ratio.  Figure  2  repre¬ 
sents  t]N  as  a  function  of  TV  for  r  =  11  and  the  germ 
({0, 4, 5},  {1, 2, 6}).  This  result  clearly  proves  that  in 
this  case  SMF  is  asymptotically  efficient. 

4.3.  Comparison  with  the  GAF 

In  order  to  compare  the  performances  of  SMF  and 
GAF,  we  define  the  ratio: 


For  a  given  value  of  L,  (5)  shows  that  the  variance  of 
aM  is  0(1/TV2).  The  CRLB  of  aM  is  0(1/TV2M+1). 
Consequently  SMF  is  not  an  asymptotically  efficient 
estimator  for  L  constant.  However,  the  expression  of 
my{r ))  given  above  shows  that  L  can  take  values  up  to 
(TV  —  1  )/lp.  A  possibility  to  increase  the  efficiency  of 
SMF  is  to  chose  L  as  an  increasing  function  of  TV. 

For  example  L  can  be  chosen  as  ( N-\)/Ip-t .  In  this 
case  tIp  is  the  minimum  number  of  terms  averaged  for 
the  estimation  of  the  moments.  This  approach  is  ana¬ 
log  to  the  one  used  in  classical  spectral  analysis  where 
the  correlogram  is  windowed  to  reduce  its  variance,  [8]. 
The  choice  of  r  relies  on  a  compromise  between  the 
reduction  of  the  estimator  variance  and  the  poor  qual¬ 
ity  of  the  last  moments  caused  by  a  low  average  of 
terms  in  (4).  It  is  important  to  remember  that  the 
moments  (4)  are  always  unbiased. 

In  order  to  study  the  performances  of  the  estimator, 
the  ratio 


A 

Vn  = 


var  {ajMF} 

CRLB  {d  a/} 


(7) 


A 

rqv  = 


lvax{a'l^F 
var  {affAF 


} 

}' 


(8) 


The  two  expressions  of  the  variance  are  obtained  in 
the  case  SNR  »  1.  The  analytical  expression  of  affAF 
variance  has  been  first  computed  in  [5]  and  after,  with 
less  restrictive  conditions,  in  [1]. 

Figure  3  represents  r'jv  as  a  function  of  TV  in  the 
case  M  =  3  for  the  germ  ({0,4,5},  {1,2,6})  and  for 
various  values  of  L.  These  plots  show  that  for  a  given 
value  of  L,  vN  has  a  minimum  smaller  than  1.  This 
result  allows  to  select,  for  a  given  number  of  measured 
samples,  an  “optimal”  value  of  L.  Moreover,  it  shows 
that  it  is  always  possible  to  choose  a  L  in  order  that 
the  variance  of  SMF  is  lower  than  the  variance  of  GAF. 


4.4.  Optimal  choice  of  SMF  parameters 

This  last  section  provides  the  optimal  values  for  the 
SMF  parameters  in  the  case  of  order  3  and  4  polynomial 
phase  signal.  Using  formula  (5),  the  optimal  values 
of  the  germ  with  the  associated  value  of  L  have  been 
computed  for  different  values  of  TV.  The  results  are 
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N 


Figure  3:  Comparison  between  SMF  and  GAF  for  03. 
The  germ  equals  ({0, 4, 5},  {1, 2, 6}). 


Figure  4:  Optimal  choice  of  L  and  the  germ  in  the  case 
M  =  3. 


Figure  5:  Optimal  choice  of  L  and  the  germ  in  the  case 
M  =  4. 


provided  in  figure  4  and  5  which  gives  for  N  in  the 
range  8  <  N  <  314  the  optimal  L  and  the  number  of 
the  corresponding  germ.  The  germs  are  numbered: 

#1  ({0,3, 3}, {1,1, 4}), 

#2  ({0,4, 5}, {1,2, 6}), 

#3  ({0,5, 7}, {1,3, 8}), 

#4  ({0,5, 8}, {2, 2, 9}), 

#5  ({0,6, 9}, {1,4, 10}), 

for  M  =  3,  and: 

#1  ({0,3, 4, 7}, {1,1, 6, 6}), 

#2  ({0,4, 7, 11}, {1,2, 9, 10}), 

#3  ({0,5, 5, 10}, {1,2, 8, 9}), 

#4  ({0,5, 10, 15}, {1,3, 12, 14}), 

#5  ({0,5, 12, 17}, {2, 2, 15, 15}), 

for  M  =  4. 

This  result  shows  that  for  the  cubic  phase  signal 
the  optimal  germ  in  the  range  10  <  N  <  150  is 
#2  =  ({0,4, 5},  {1,2, 6}).  This  result  is  in  accor¬ 
dance  with  the  one  obtained  by  simulation  in  [3].  Af¬ 
ter  a  short  transition  interval,  the  optimal  germ  be¬ 
comes  #3  =  ({0,5, 7},  {1,3, 8})  for  the  following  in¬ 
terval  which  occurs  for  N  >  200.  The  behaviour 
in  the  case  M  =  4  is  similar,  the  optimal  germ  for 
10  <  N  <  200  being  #3  =  ({0, 5, 5, 10},  {1, 2, 8, 9}).  It 
is  worthy  to  note  that  in  both  cases,  L  is  approximately 
a  linear  function  of  N  in  the  succesive  intervals. 

5.  CONCLUSIONS 

This  communication  gives  a  theoretical  expression  of 
the  error  variance  of  SMF  algorithm  for  polynomial 
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phase  higher  order  degree  coefficient  estimation.  This  Using  (2)  and  the  previous  expression,  it  is  possible 

expression  leads  to  a  modification  of  the  algorithm  in  to  compute  the  expression  of  the  sum  in  (10)  for  the 

order  to  obtain  its  asymptotic  efficiency.  Moreover  it  different  values  of  n: 

confirms  the  superiority  of  the  SMF  with  respect  to  the  /  1  dm  (r)  \  1 

GAF,  conditioned  to  a  correct  choice  of  the  algorithm  Imag  ( j  =  -- — - -Sur, 

parameters.  Finally,  optimal  values  of  the  SMF  param-  \my\r)  dwr  )  w=0  A(N  -  rlp) 

eters  are  provided  for  order  three  and  four  polynomial  (^) 

phase  signals.  where  S  is  a  diagonal  NxN  matrix  with  diagonal  terms 

{sin ,sin0iv}- 

A.  PROOF  OF  PROPOSITION  1  A  similar  development  leads  to: 


Denote  by  w  =  wr  +  jw,  the  noise  vector.  For  a  high 
SNR: 


where  C  is  a  diagonal  NxN  matrix  with  diagonal 
terms  {cos0i,...  ,cos0jv}.  The  substitution  of  (11,12) 
in  (9)  terminates  the  proof.  ■ 


If  w  =  0,  it  can  be  easily  verified  that  my(r)  =  mx(r) 
and  consequently  G(0,0)  =  aM.  Using  the  circularity 
of  w,  we  obtain: 


var{aM}  = 


dG(w) 

dwv 


r  Ir/=oI 


dG(w) 

dwi 


1  1 10=0* 


Herein,  the  computation  of  the  gradient  of  G(w) 
leads  to  the  computation  of  the  gradient  of  the  an¬ 
gle  of  my  (r) .  The  term  associated  to  the  gradient  with 
respect  to  the  real  component  equals: 

dG(w)  2  _ _ 1 _ 

.  y>«lmag(,  \  /7(r>) 

\my(r)  dwr  )  w=0 

Substituting  (4)  in  the  expression  of  the  nth  compo¬ 
nent  of  the  previous  gradient,  we  obtain: 

taih'y1)  = 

\my{r)  dwr,n  )  w=0 

'N-H’  S  Um  *v.»  j„=0 


The  previous  derivatives  do  not  equal  zero  if  n  = 


k  +  rig  and  in  this  case  : 


i  duUyXi 

my{r) 


=  Ima§y 


j^'q^-qSVTKpfi. 
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ABSTRACT 

An  important  task  in  underwater  passive  sonar  sig¬ 
nal  processing  is  the  determination  of  target  signatures 
based  on  the  narrow-band  signal  content  in  the  received 
signal.  To  achieve  good  classification  performance  it  is 
important  to  be  able  to  separate  the  different  sources 
(e.g.  engine,  hull  and  drive)  present  in  the  signature, 
and  to  determine  the  distinct  frequency  coupling  pat¬ 
tern  of  each  of  these  sources.  In  this  work  we  demon¬ 
strate  how  this  can  be  done  using  bispectral  techniques 
applied  to  data  recorded  at  a  sea  trial  in  the  Baltic 
Sea.  As  a  target  we  used  a  23  ft  fiberglass  motor  boat 
powered  by  a  4-cylinder,  4-stroke,  turbo-charged  diesel 
engine  connected  to  a  stern  drive  with  two  counter  ro¬ 
tating  propellers.  Data  was  recorded  with  a  bottom 
mounted  hydrophone  array  as  well  as  with  accelerom¬ 
eters  mounted  on  the  engine  and  hull.  It  was  found 
that  the  harmonics  that  propagated  through  water  are 
engine  related  at  low  speeds  and  drive  related  at  high 
speeds.  The  hull  vibrations  are  only  present  at  very 
low  speeds.  Moreover,  we  found  that  normalized  bis¬ 
pectrum  measures  (skewness)  could  provide  additional 
coupling  information  not  visible  in  the  standard  bis¬ 
pectrum. 

1.  INTRODUCTION 

In  passive  sonar  signature  estimation  it  is  important  to 
be  able  to  separate  the  different  narrow-band  contri¬ 
butions  that  are  present  in  the  received  signal.  If  the 
source  is  a  vessel  with  a  conventional  engine/drive  con¬ 
figuration  a  good  first  characterization  can  often  be  ob¬ 
tained  with  the  power  spectrum  alone,  but  for  a  more 
precise  characterization  the  phase  couplings  between 
harmonics  must  be  uncovered.  The  phase  coupling 
patterns  can  be  used  to  separate  the  different  sources 


present  in  the  signature.  However  it  is  a  well-known 
fact  that  conventional  power  spectral  techniques  are 
phase-blind  and  cannot  be  used  to  track  phase  cou¬ 
plings,  hence  the  use  of  bispectral  techniques  [1]. 

A  stationary  signal  with  narrow-band  content  at 
the  frequencies  /i ,  /2  and  +  /2  will  show  peaks  in 
the  power  spectrum  at  these  frequencies.  In  the  bis¬ 
pectrum  however,  a  peak  at  the  bifrequency  (/i,/2) 
will  occur  if  and  only  if  the  signals  are  phase-coupled. 
The  ability  of  the  bispectrum  to  detect  phase-couplings 
has  been  utilized  in  such  diverse  areas  as  diagnosis  of 
heart  conditions  [2],  nonlinear  wave  interaction  in  tidal 
waves  [3],  and  machine  monitoring  [4]. 

In  the  present  work  we  report  on  an  experiment  on 
harmonic  characterization  of  (hydro-)acoustic  signals 
performed  in  shallow  waters  using  a  small  motor  boat 
with  a  diesel  engine  and  a  stern  drive  as  a  source.  Our 
main  objective  has  been  to  determine  if  the  phase  cou¬ 
pling  pattern  between  harmonics  present  in  the  engine, 
drive  and  hull  are  preserved  after  propagation  through 
(shallow)  water,  and  if  it  is  possible  to  utilize  this  in¬ 
formation  for  classification  purposes.  More  specifically, 
the  focus  has  been  on  the  possibility  to  separate  the 
different  generating  sources  (engine,  drive  and  hull)  in 
hydrophone  data.  A  secondary  objective  has  been  to 
compare  bispectrum  and  skewness  based  techniques  in 
this  particular  application. 

2.  SPECTRUM,  BISPECTRUM  AND 
SKEWNESS 

Given  a  discrete  time  series  x(n)  obtained  by  sampling 
with  frequency  fs  a  (zero-mean,  second-order  station¬ 
ary)  process  x(t).  The  power  spectrum  P(k)  of  x(n), 
for  discrete  frequency  f  —  kAf  with  A /  =  /„/M, 
can  be  estimated  by  conventional  averaging  of  peri- 
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Figure  1:  Typical  time  series  of  the  signals;  engine  ac¬ 
celerometer  (top)  and  hydrophone  (bottom). 

odograms  by  dividing  the  time  series  x{n )  into  K  (pos¬ 
sibly  overlapping)  blocks,  each  of  length  M,  and  com¬ 
puting  (possibly  using  tapering)  the  M  point  DFT  for 
each  block.  The  estimated  power  spectrum  P(k )  at 
frequency  bin  k  is  then  obtained  as 

3= 1 

where  Xj(k)  is  the  DFT  over  block  j  at  frequency  bin 
k.  In  a  similar  way  (assuming  third-order  stationar- 
ity)  the  bispectrum  B(k,£)  and  skewness  1  S2(k,C ), 
for  discrete  bifrequency  (/i,/2)  =  (fcA /, CAf),  can  be 
estimated  with  the  direct  method.  The  (averaged)  bis¬ 
pectrum  and  skewness  estimates  B(k,C)  and  S2(k,£), 
respectively,  are  then  given  by  [5] 

1  K 

j= i 

S2(M)  =  - — — , 

P(k)P{t)P{k  + 1) 

where  (•)*  denotes  complex  conjugation. 

3.  SEA  TRIAL 

The  sea  trial  was  conducted  in  the  Baltic  Sea  off  the 
east  coast  of  Sweden,  in  shallow  waters  of  approxi¬ 
mately  constant  depth,  30  meters.  As  target  a  23  ft 
fiberglass  motor  boat  (Botnia  Marine  model  Targa- 
23)  was  used,  powered  by  a  4-cylinder,  4-stroke  turbo¬ 
charged  Volvo  Penta  (VP)  diesel  engine  (type  AD31P- 
A)  equipped  with  a  VP  Aquamatic  stern  drive  (type 
1 A  square-root  of  it  is  called  bicoherence  index  in  [1], 
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Figure  2:  Typical  power  spectra  of  the  signals;  engine 
accelerometer  (top)  and  hydrophone  (bottom). 

AD31/DP)  having  an  engine/drive  gear  ratio  of  2.3:1. 
The  drive  was  fitted  with  two  counter-rotating  pro¬ 
pellers  (VP  type  A7)  having  3  (front)  and  4  (rear) 
blades,  respectively.  2  Engine  and  hull  vibrations 
were  recorded  with  two  one-axis  accelerometers,  one 
fitted  directly  on  the  engine  mount  and  one  to  the 
hull  close  to  the  engine.  Water-propagated  sound  was 
recorded  using  a  hydrophone  array,  with  four  wide¬ 
band  omnidirectional  hydrophones  horizontally  equally 
spaced,  but  at  various  depths.  In  the  subsequent  anal¬ 
ysis  presented  in  this  work  one  hydrophone  mounted  at 
a  depth  of  17  meters  was  utilized. 

In  order  to  separate  the  different  sources  involved 
(e.g.  engine,  hull  and  drive),  several  recordings  were 
made  at  various  rpm,  both  with  the  boat  drifting  freely 
with  the  drive  disconnected  and  with  the  boat  moving 
with  the  drive  connected.  In  each  of  the  recordings 
where  the  boat  was  powered  by  its  drive  it  was  run  on 
a  straight  track,  at  constant  throttle,  passing  directly 
above  the  hydrophone  array.  Ambient  sea  noise  was 
recorded  and  analyzed  to  ensure  that  it  had  negligible 
effect  on  the  end  result.  The  weather  conditions  during 
the  sea  trial  were  good  with  wind  speeds  below  5  m/s. 
The  sound  velocity  profile  was  also  measured,  and  was 
found  to  be  approximately  flat  over  the  whole  water 
depth.  All  data  was  recorded  with  a  sampling  rate  of 
25  kHz,  which  was  considered  to  be  sufficiently  high 
since  most  signal  and  noise  power  was  below  5  kHz 
and  virtually  no  power  was  present  over  10kHz.  To 
ensure  that  the  phase  relations  in  the  recorded  signal 

2The  most  notable  advantage  with  having  two  counter¬ 
rotating  propellers,  rather  than  one  single,  is  less  noise  and  vibra¬ 
tion.  Hence,  the  power  in  drive  related  signals  from  this  vessel 
can  be  expected  to  be  lower  than  with  other  forms  of  drives. 
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Figure  3:  Bispectruin  magnitude  for  the  hull  ac¬ 
celerometer  time  series  from  the  1830  rpm  straight- 
track  run  (at  CPA).  The  grid  spacing  is  15.25  Hz  which 
corresponds  to  half  the  engine  axis  rotation  frequency. 

and  noise  were  preserved  no  prefiltering  was  used. 

4.  DATA  ANALYSIS  AND  RESULTS 

In  the  following  we  will  show  the  results  of  an  analysis 
of  time  series  from  three  different  rpm  straight-track 
recordings,  at  1830,  2712  and  3549  rpm,  respectively. 
The  data  and  results  presented  here  are  taken  from 
a  time  frame  of  15  seconds  duration  around  the  clos¬ 
est  point  of  approach  (CPA)  to  the  hydrophone  array. 
In  the  bispectrum  estimates  the  number  of  blocks  was 
K  =  22  and  the  number  of  points  in  the  DFTs  was 
M  =  16384.  The  same  number  of  points  in  the  DFTs 
were  used  in  the  skewness  estimates  but  to  achieve  a 
consistent  estimate  an  overlap  of  12288  was  used  yield¬ 
ing  a  total  number  of  blocks  K  =  88.  A  Hamming 
tapering  was  applied  to  data  in  all  DFT  computations. 

Figure  1  displays  a  typical  example  of  time  series 
from  the  hull  accelerometer  and  the  hydrophone.  The 
corresponding  power  spectra  of  the  time  series  in  Fig.  1 
are  seen  in  Fig.  2.  By  conventional  power  spectral 
based  analysis  it  is  difficult  to  separate  and  relate  the 
peaks  of  different  sources  (engine,  drive  and  hull).  How¬ 
ever,  with  bispectral  analysis  it  is  easier  to  identify  and 
separate  the  sources. 

4.1.  Bispectrum 

In  Figure  3  the  estimated  absolute  value  of  the  bispec¬ 
trum  for  the  hull  accelerometer  data  from  the  1830  rpm 
straight-track  run  at  CPA  is  displayed.  The  grid  spac¬ 
ing  is  15.25  Hz  which  corresponds  to  half  the  engine 
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Figure  4:  Bispectrum  magnitude  for  the  hydrophone 
time  series  from  the  1830  rpm  straight-track  run  (at 
CPA).  The  grid  spacing  is  the  same  as  in  Fig.  3. 

axis  frequency.  Here,  and  in  all  subsequent  figures  in¬ 
volving  the  bispectrum,  the  values  are  quantized  to  10 
levels  with  white  indicating  the  lowest  level  and  the 
other  levels  given  by  the  grayscale  on  the  right.  It 
can  be  seen  that  there  are  strong  coupled  modes,  in¬ 
duced  mainly  by  the  torque  variations  of  the  engine 
due  to  inertia,  piston  angular  velocity  and  gas  pressure 
variations.  This  is  an  example  of  the  “second  order” 
harmonics  appearing  at  twice  the  engine  axis  rotation 
frequency  [6],  which  are  moreover  coupled  to  the  as¬ 
sociated  fourth  order  harmonics.  Coupled  second  and 
third  order  engine  axis  harmonics  are  also  visible.  The 
corresponding  bispectrum  for  the  hydrophone  is  seen 
in  Fig.  4  (same  grid  spacing  as  in  Fig.  3)  where  the 
only  visible  (off-diagonal)  peak  is  at  bifrequency  (ap¬ 
prox)  (61,40)  Hz,  which  represents  a  coupling  between 
an  engine  and  a  drive  harmonic. 

In  Figures  5  and  6  the  bispectra  for  the  hull  ac¬ 
celerometer  and  hydrophone  data,  respectively,  for  the 
2712  rpm  straight-track  run  is  shown.  The  grid  used 
in  Fig.  5  is  22.6  Hz,  which  is  half  the  engine  axis  fre¬ 
quency.  Also  one  can  see  strong  coupled  engine  har¬ 
monics  in  the  hull  accelerometer  data,  at  the  second 
and  third  order.  Moreover,  one  can  see  couplings  be¬ 
tween  the  engine  and  drive,  at  bifrequencies  (approx) 
(117,20)  Hz  and  (136,20)  Hz.  The  grid  spacing  used 
in  Fig.  6  is  19.5  Hz,  which  corresponds  to  the  pro¬ 
peller  axis  frequency,  and  several  frequency  couplings 
are  visible.  Notable  in  particular  is  the  peak  at  bifre¬ 
quency  (approx)  (137,20)  Hz  (and  its  neighbors),  and 
the  band-like  structure  of  peaks  around  ft  =  156  Hz.  It 
appears  that  all  the  peaks  fall  on  the  grid.  Hence,  these 
harmonics  are  drive  related.  This  can  be  explained  by 
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Figure  5:  Bispectrum  magnitude  for  the  hull  ac¬ 
celerometer  time  series  from  the  2712  rpm  straight- 
track  run  (at  CPA).  The  grid  spacing  is  22.6  Hz,  which 
is  half  the  engine  axis  frequency. 

the  fact  that  the  propeller  noise  increases  dramatically 
when  the  speed  increases  and  that  the  engine  load,  and 
hence  vibrations,  are  lower  at  the  higher  speed  since  the 
boat  is  then  hydrofoiling. 

In  Figures  7  and  8  the  bispectra  of  the  hull  ac¬ 
celerometer  and  hydrophone  data,  respectively,  from 
the  3549  rpm  straight-track  run  are  displayed.  The 
results  are  about  the  same  as  the  ones  obtained  at 
2712  rpm.  In  the  accelerometer  bispectruin  in  Fig.  7, 
where  the  grid  spacing  is  29.6  Hz  corresponding  to 
half  the  engine  axis  frequency,  again  one  can  see  a 
few  engine-engine  coupled  harmonics,  at  the  expected 
orders,  and  some  additional  engine-drive  coupled  har¬ 
monics.  In  the  hydrophone  bispectrum  in  Fig.  8,  where 
the  grid  spacing  is  25.6  Hz,  which  corresponds  to  the 
propeller  axis  frequency,  a  very  rich  coupling  structure 
is  again  visible.  Also  here  it  appears  as  if  all  peaks 
fall  on  the  grid  and  hence  the  harmonics  are  all  drive 
related. 

4.2.  Skewness 

Given  the  fact  that  apparently  all  visible  coupled  har¬ 
monics  in  the  hydrophone  data  for  higher  speeds  (2712 
and  3549  rpm)  fall  on  frequencies  commensurable  with 
the  drive  frequency  a  natural  question  is  if  a  more  care¬ 
ful  analysis,  using  for  instance  the  skewness,  would  re¬ 
veal  additional  coupling  information.  This  indeed  turns 
out  to  be  the  case,  as  shown  in  Fig.  9  where  the  skew¬ 
ness  for  the  2712  rpm  straight-track  run  is  shown  using 
a  grid  spacing  of  19.5  Hz,  which  corresponds  to  the 
propeller  axis  frequency.  Here,  only  the  values  exceed¬ 
ing  half  of  the  full  range  are  shown,  and  these  values 
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Figure  6:  Bispectrum  magnitude  for  the  hydrophone 
time  series  from  the  2712  rpm  straight-track  run  (at 
CPA).  The  grid  spacing  is  19.5  Hz  which  corresponds 
to  the  propeller  axis  rotation  frequency. 


are  quantized  to  10  levels  and  displayed  using  the  up¬ 
per  half  of  the  grayscale  on  the  right.  There  are  sev¬ 
eral  peaks  that  do  not  fall  on  the  grid,  most  notably 
the  ones  at  (approx)  (286, 86)  Hz,  (335, 166)  Hz  and 
(334,263)  Hz.  Moreover,  these  peaks  do  not  fall  on 
the  grid  corresponding  to  multiples  of  half  of  the  en¬ 
gine  axis  frequency  either.  Therefore,  it  is  conceivable 
that  these  coupled  frequencies  are  sums  or  differences 
between  multiples  of  the  engine  and  drive  frequencies, 
possibly  generated  by  quadratic  phase  coupling.  How¬ 
ever,  further  study  is  needed  to  determine  the  nature 
of  these  peaks. 


5.  CONCLUSION 

Only  for  low  speeds  (1830  rpm)  is  it  possible  to  see 
engine  harmonics  in  the  hydrophone  data,  despite  the 
presence  of  such  harmonics  in  the  hull  data.  Thus, 
the  hull  does  not  act  as  a  “projector”  for  engine  vi¬ 
brations.  Instead,  the  dominating  source  at  medium 
(2712  rpm)  to  high  speed  (3549  rpm)  is  the  drive  and 
at  high  speed  only  the  drive  is  visible  in  the  bispectrum 
from  hydrophone  data.  However,  using  the  skewness  it 
is  possible  to  detect  coupled  harmonics  that  are  neither 
strictly  engine  related  nor  strictly  drive  related.  The 
propeller  leaves  a  clear  trace  in  both  the  bispectrum 
and  skewness  for  medium  speeds,  in  terms  of  peaks  at 
7,8,  and  9  times  the  propeller  axis  frequency,  which 
might  be  useful  in  determining  the  propeller  configura¬ 
tion. 


284 


354.9, 


200 


180 


Frequency  fl  [Hz] 

Figure  7:  Bispectrum  magnitude  for  the  hull  ac¬ 
celerometer  time  series  from  the  3549  rpm  straight- 

track  run  (at  CPA)  with  frequency  grid  corresponding 

to  half  the  engine  axis  frequency. 
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Figure  8:  Bispectrum  magnitude  for  the  hydrophone 
time  series  from  the  3549  rpm  straight-track  run  (at 
CPA)  with  grid  spacing  equal  to  the  propeller  axis  ro¬ 
tation  frequency. 


Frequency  fl  [Hz] 


Figure  9:  Skewness  for  the  hydrophone  time  series  from 
the  2712  rpm  straight-track  run  (at  CPA).  The  grid 
spacing  is  19.5  Hz  which  equals  the  propeller  axis  fre¬ 
quency. 
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ABSTRACT 

We  have  devised  a  new  generalized  likelihood  ratio  test 
for  detecting  a  signal  in  unknown,  strong  non-Gaussian 
low  rank  interference  plus  white  Gaussian  noise  which 
needs  no  knowledge  of  the  non-Gaussian  distribution. 
Prom  perturbation  expansions  of  the  test  statistic,  we 
establish  the  connection  of  the  proposed  GLRT  detec¬ 
tor  to  the  UMPI  test  and  show  that  it  is  approximate¬ 
ly  CFAR.  Computer  simulations  indicate  that  the  new 
detector  significantly  outperforms  traditional  adaptive 
methods  in  non-Gaussian  interference. 

1.  INTRODUCTION 

Non-Gaussian  disturbances  have  been  reported  in  di¬ 
verse  applications  such  as  radar,  sonar,  digital  com¬ 
munications,  and  radio  astronomy.  Signal  detection 
in  unknown  colored  noise  backgrounds  has  tradition¬ 
ally  been  accomplished  using  adaptive  methods  based 
on  the  Gaussian  model,  whether  or  not  the  noise  is 
actually  Gaussian  distributed.  However,  recent  work 
has  shown  that  the  performance  of  adaptive  detectors 
based  on  the  Gaussian  model  can  degrade  severely  when 
operating  in  correlated  non-Gaussian  noise  background- 
s  [1].  To  illustrate  this,  we  computer  simulated  the 
invariant  matched  subspace  detector  (MSD)  of  Schar- 
f  et.  al.  [2]  in  noise  consisting  of  a  strong,  highly 
correlated  rank-2  compound-Gaussian  component  em¬ 
bedded  in  white  Gaussian  noise  noise.  Two  versions 
were  considered:  the  optimum  MSD  that  knows  the 
true  interference  subspace  and,  motivated  by  the  Prin¬ 
cipal  Component  Inverse  (PCI)  method  [3],  an  adaptive 
MSD  (ASD)  that  uses  an  estimate  of  the  interference 
subspace  obtained  from  signal-free  training  data.  As  a 
reference,  we  also  evaluated  the  ASD  using  pure  Gaus¬ 
sian  noise  that  had  the  same  nominal  covariance  matrix 
as  in  the  non-Gaussian  case.  The  results  for  all  three 
cases  are  plotted  in  figure  1.  As  is  clearly  seen,  the 
performance  of  the  ASD  degrades  substantially  in  the 
non-Gaussian  noise,  whereas,  the  adaptive  detector  in 
pure  Gaussian  noise  has  performance  nearly  identical 
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Figure  1:  Experimentally  measured  ROC  curves  com¬ 
paring  the  performance  of  the  detectors  at  a  signal-to- 
interference  ratio  of  -5  dB. 


to  the  optimum  MSD.  The  effect  of  non-Gaussian  inter¬ 
ference  on  the  PCI  and  subspace  methods  is  discussed 
in  [4], 

The  underlying  problem  of  designing  detectors  for 
non-Gaussian  clutter  has  been  the  selection  of  a  suit¬ 
able  multivariate  probability  density  function  (pdf)  fam¬ 
ily  to  model  the  clutter.  The  difficulty  is  that  in  most 
applications  there  exists  no  single  family  of  multivari¬ 
ate  non-Gaussian  pdfs  that  accurately  characterizes  the 
clutter  in  all  scenarios  and  environments.  Regardless, 
even  if  the  non-Gaussian  pdf  family  is  known,  the  pdf 
parameters  themselves  are  usually  unknown  and  their 
estimation  from  training  data  can  be  problematic.  An¬ 
other  difficulty  is  the  sensitivity  of  parametric  pdf  es¬ 
timators  and  detectors  to  contaminants  in  the  training 
data.  An  alternative  approach  is  to  use  some  sort  of 
non-parametric  method,  e.g.,  such  as  designing  local¬ 
ly  optimum  detectors  based  on  non-parametric  kernel- 
based  pdf  estimators  [5].  However,  these  methods  are 
best  suited  for  estimating  univariate  pdfs  and  are  dif- 
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ficult  to  extend  to  higher  dimensions  and  require  large 
amounts  of  training  data. 

In  many  applications  where  the  noise  appears  to 
be  non-Gaussian,  the  noise  can  actually  be  modeled  as 
consisting  of  two  components:  a  strong  non-Gaussian 
component  which  gives  rise  to  the  overall  non-Gaussian 
characteristics,  and  a  residual  Gaussian  part,  made  up 
of  ambient  noise  and  diffuse  clutter.  We  now  propose 
an  alternative  approach  inspired  by  the  methodology 
used  in  [6]  to  detect  weak  signals  in  non-Gaussian  Arc¬ 
tic  sea  noise.  Rather  than  trying  to  model  the  overall  or 
individual  non-Gaussian  characteristics  of  the  noise,  a 
simpler  approach  is  to  develop  compact  representation- 
s  to  model  the  non-Gaussian  and  Gaussian  waveforms. 
Then,  treating  their  parameters  as  unknown,  but  deter¬ 
ministic,  the  detection  problem  can  be  formulated  as  a 
composite  hypothesis  testing  problem  [7].  This  detec¬ 
tion  problem  is  often  easier  to  solve  than  the  original 
non-Gaussian  problem,  say  by  a  generalized  likelihood 
ratio  test  (GLRT). 

More  precisely  we  model  the  received  complex-valued 
mx  1  noise  plus  signal  space-time  data  snapshot  at  time 
tk  as  a  superposition 


Tn 

zk  =  YLakihi  +  <^  +  Dfc 

?=*  ,  signal  background  white  noise 

^  v-  — 

subspace  interference 

(1) 

of  a  strong  subspace  non-Gaussian  interference  com¬ 
ponent  and  a  background  white  Gaussian  noise  com¬ 
ponent  n k,  and  possibly  a  signal  component.  The  ak 
and  ck  are  the  noise  and  signal  expansion  coefficients 
respectively  and  the  bj  and  s  are  the  noise  and  signal 
basis  vectors  respectively.  The  non-Gaussianity  of  the 
noise  is  modeled  as  arising  from  the  expansion  coeffi¬ 
cients  ak  rather  than  the  basis  vectors  bj.  For  conve¬ 
nience,  a  rank-1  signal  is  assumed. 

For  the  case  of  known  bj,  but  unknown  aj?  with 
unknown  multivariate  pdf  and  unknown  white  noise 
variance,  it  is  reasonable  to  seek  a  test  which  is  invari¬ 
ant  to  these  parameters.  Ideally,  we  desire  a  uniformly 
most  powerful  invariant  (UMPI)  test  [7]  (the  UMPI  test 
maximizes  the  probability  of  detection  regardless  of  the 
parameter  values  while  keeping  the  false  alarm  rate  less 
than  or  equal  to  some  specified  value).  Scharf  et.  al. 
[2]  showed  that  for  data  of  the  form  (1)  with  known 
interference  and  signal  subspaces,  the  UMPI  test,  re¬ 
ferred  to  as  the  matched  subspace  detector  (MSD),  is 
(in  simplified  form) 


11  Pp£s  z  Hf  >  . 
II  Pbs  z  IIf  — 


(2) 


where  A  is  some  threshold.  The  matrix  PP±s  is  the 
projection  operator  onto  the  part  of  the  sign5  that  re¬ 
mains  after  the  subspace  interference  has  been  nulled 
and  Pgs  is  the  projection  operator  that  nulls  out  both 
the  subspace  interference  and  signal  component.  Math¬ 
ematically,  Pp±  5  and  PgS  are  given  by 


Pp-s  =  P^S(SH  Posy's”  p£  (3) 

and 

p£s =/  -  mawpisirMW  (4) 

where  B  =  [bi,b2, . . . ,brJ  and  S  =  s.  The  matrix 
[B|S]  is  obtained  by  concatenating  B  and  S  column¬ 
wise)  respectively,  and 

P^  =  I-B(BHB)~1BH  (5) 

Test  (2)  is  maximally  invariant  to  scalings  of  the 
data  and  rotations  in  the  column  space  of  B.  Hence  it 
is  CFAR  with  respect  to  the  background  noise  level.  It 
is  emphasized  since  (2)  is  UMPI,  no  other  CFAR  test 
can  perform  better. 

Although  test  (2)  is  optimum,  it  is  difficult  to  realize 
because  the  interference  subspace  B  is  seldom  known 
beforehand  in  practice.  One  approach  is  to  use  the 
methodology  of  the  PCI  method  [3]  and  estimate  the 
unknown  interference  subspace  from  a  set  of  signal-free 
training  data.  However,  as  the  previous  and  upcoming 
numerical  examples  indicate,  this  approach  may  not  be 
optimum  when  the  low  rank  noise  is  non-Gaussian. 

The  approach  we  take  is  to  treat  B,  ak  and  the 
white  noise  variance  as  unknown,  but  deterministic, 
and  derive  the  GLRT  [8]  (the  GLRT  is  obtained  by 
replacing  the  unknown  parameters  in  the  likelihood- 
ratio  test  by  their  ML  estimates).  Our  motivation  is 
that  in  certain  instances,  the  GLRT  can  actually  be 
UMPI  and  often  leads  to  a  reasonable  or  good  test  [2] 


2.  NEW  GLRT  DETECTOR 

A  secondary  data  set  of  K  signal-free  data  vectors  is 
assumed  available  for  training,  stacked  column-wise  in¬ 
to  a  m  x  K  matrix  X.  Detection  of  the  signal  is  to  be 
performed  on  a  primary  data  set,  consisting  of  a  single 
data  snapshot,  denoted  as  Y.  Under  the  null  hypothe¬ 
ses  Ho  and  signal  present  hypotheses  Hi ,  the  observed 
data  matrices  Z  =  [X\ Y]  are  modeled  as 

Ho  •  Z  =  BA  +  N  (noise  only)  (6) 

Hi  :  Z  =  BA  +  [0|Sc]  +  N  (signal  +  noise)  (7) 
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where  B  is  a  m  x  rn  matrix  whose  columns  generate 
the  low  rank  interference  space,  A  is  a  rn  x  K  +  1  ma¬ 
trix  whose  elements  contain  the  low  rank  interference 
expansion  coefficients,  S  is  a  n  x  1  signal  replica,  and 
c  is  the  signal  amplitude.  The  elements  of  matrix  N 
are  modeled  as  IID  complex  Gaussian  random  variables 
with  zero-mean  and  variance  a2.  S  is  assumed  known, 
but  A,  B,  c,  and  a2  are  assumed  to  be  unknown,  but 
deterministic. 

A  GLRT  statistic  for  the  hypothesis  testing  problem 
of  (6)  and  (7)  is  then 


We  now  use  a  first-order  perturbation  expansion  for 
the  SVD  of  a  data  matrix  [9]  to  obtain  an  approxima¬ 
tion  to  the  GLRT  test  statistic  (9)  which  can  be  related 
to  the  UMPI MSD  (10).  In  the  analysis,  both  Sc  and  N 
are  regarded  as  perturbations  and  weak  relative  to  BA. 
The  specific  derivation  details  are  shown  in  appendix 
A.  The  final  approximation  for  the  GLRT  statistic  de¬ 
rived  in  appendix  A  is 
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which  simplifies  to  the  ratio  of  fitting  errors 

_  min£0,A  \\Z  -  B0Zq\\2f 
V  ~  mmBuBuc  \\Z  -  B,Ai  -  [0|5c]|| 


(8) 

(9) 


The  numerator  of  (9)  is  the  square-error  in  fitting  the 
matrix  Z  by  a  rank  rn  matrix  and  can  be  easily  eval¬ 
uated  using  the  SVD  of  Z.  Similarly,  the  denominator 
of  (9)  is  the  error  in  jointly  fitting  Z  by  a  rank  rn  ma¬ 
trix  and  the  linear  part  [0|5c].  However,  it  can  not  be 
directly  evaluated  using  the  SVD  of  Z. 

To  numerically  evaluate  the  denominator,  we  pro¬ 
pose  a  simple  scheme  that  is  based  on  a  criss-cross 
regression-like  method.  The  idea  is  to  linearize  the  min¬ 
imization  by  holding,  say  B,  constant  and  then  mini¬ 
mizing  with  respect  to  only  A  and  c.  This  is  a  standard 
linear  least-squares  fitting  problem  and  is  easy  to  solve. 
The  procedure  is  then  repeated,  this  time  replacing  A 
with  its  estimate  from  the  previous  step  and  now  min¬ 
imizing  with  respect  to  B  and  the  c.  These  steps  are 
repeated  until  convergence. 


3.  RELATIONSHIP  TO  UMPI  DETECTOR 


We  now  establish  the  connection  of  the  proposed  GLRT 
to  the  UMPI  matched  subspace  detector  of  Scharf  et 
al.  [2]  by  deriving  a  simple  approximation  to  the  test 
statistic.  First,  in  order  to  make  the  comparison,  we 
need  to  extend  the  single  data  vector  optimum  MS- 
D  (2)  to  the  multiple  data  vector  case  of  (6)  and  (7). 
This  is  simple  to  do  and  by  substitution  (by  concate¬ 
nating  all  the  columns  of  Z  into  one  vector),  we  obtain 
the  optimum  MSD  test  statistic  for  the  multiple  data 
vector  case: 


Vmsd  -  1  = 


II  PS'  z'IIf 

II  PS'  z'  IIf 


(10) 


where  z'  =  vec(P^Z),  S'  =  [vec{P£ [0|5]),  Ps<  = 
S'(S'"S,)-1S\  and  Pg-,  =  /  -  Ps«. 


where  z"  =  vec(P^ZP^),  S”  =  wec(P;f  [0|. 9] Pj),  Ps<-  = 

S//(5„ffi5»)-iiS«,  p±  =  j  _  ps„t  and  p±  =  j  _ 

AH  (AAH)~1A. 

The  only  difference  between  the  UMPI  MSD  (10) 
and  the  new  GLRT  (11)  is  the  post  multiplication  of 
the  data  matrix  Z  by  P£ .  Thus  to  first-order,  the 
new  GLRT  is  approximately  equivalent  to  the  optimum 
MSD.  By  inspection,  it  is  seen  that  (11)  is  invariant 
with  respect  to  common  scalings  of  the  columns  of  the 
data  matrix  Z,  and  thus  the  background  noise  level. 
Thus,  the  new  GLRT  is  at  least  approximately  CFAR 
with  respect  to  the  background  noise  level. 

When  the  interference  is  strong  and  signal  weak,  the 
loss  in  performance  of  the  GLRT  comes  from  the  addi¬ 
tional  nulling  due  to  the  post-multiplication  of  the  data 
matrix  Z  by  PnL.  This  loss  can  be  interpreted  as  aris¬ 
ing  from  having  to  estimate  the  interference  subspace 
and  is  a  function  of  the  orthogonality  of  the  interfer¬ 
ence  matrix  row  space  to  the  row  space  of  the  signal 
matrix  [0|  5c] 

4.  NUMERICAL  EXAMPLES 

We  now  present  a  numerical  example  where  a  20  el¬ 
ement  array  is  used  to  detect  a  weak  monochromat¬ 
ic  signal  embedded  in  strong,  highly  correlated  rank-2 
compound-Gaussian  clutter  plus  white  Gaussian  noise. 
The  output  from  the  array  elements  is  assumed  to  be 
already  in  complex  envelope  form,  so  all  the  data  here 
is  complex- valued. 

The  interference  components  were  computer  syn¬ 
thesized  as  follows:  The  rank-2  clutter  component  was 
modeled  as  the  scattering  arising  from  two  independent 
random  discrete  reflectors  excited  by  a  monochromatic 
signal  pulse  located  ±1/2  DFT  bin  in  wavenumber  s- 
pace  symmetrically  about  broadside.  Their  amplitude 
was  modeled  as  a  unit  variance  K-distributed  random 
variable  with  a  shape  parameter  of  .1.  Choosing  .1  as 
the  shape  parameter  makes  the  amplitude  distribution 
heavy-tailed.  The  background  noise  samples  were  mod¬ 
eled  as  independent  and  identically  distributed  zero- 
mean  complex  Gaussian  random  variables. 
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A  total  of  24  signal-free  data  snapshots  were  used 
for  the  secondary  or  training  data  set.  The  primary 
data  set  for  detection  consisted  of  a  single  data  snap¬ 
shot.  The  white  noise  variance  was  set  to  .1  giving 
a  interference-to- white-noise  ratio  of  10  dB.  The  sig¬ 
nal  direction  of  arrival  was  chosen  to  be  broadside  to 
the  array  and  the  signal  power  to  interference  ratio 
(lOlofjioo-2)  was  set  to  -5  dB.  15000  independent  trials 
with  and  without  a  signal  were  performed,  computer 
simulating  the  new  GLRT,  optimum  MSD,  ASD,  and 
Kelly’s  CFAR  GLRT  [10]  receivers.  For  comparison,  an 
analogous  pure  Gaussian  noise  case  with  the  same  nom¬ 
inal  covariance  matrix  was  also  simulated.  Note  that 
the  ASD  was  implemented  by  using  the  24  snapshot 
signal-free  secondary  data  set  to  estimate  the  rank-2 
interference  subspace  via  a  SVD  and  plugging  the  esti¬ 
mated  noise  subspace  into  (2). 

Figures  2  and  3  show  the  empirically  measured  prob¬ 
ability  of  detection  (pd)  curves  obtained  for  a  proba¬ 
bility  of  false  alarm  (pfa)  of  .005  for  the  non-Gaussian 
and  Gaussian  cases  respectively  for  all  four  detectors. 
From  the  pd  curves  in  figure  2,  it  can  be  seen  that  the 
new  GLRT  has  nearly  the  same  performance  as  the 
optimum  MSD  and  significantly  outperforms  the  ASD 
and  Kelly’s  GLRT  when  the  interference  is  compound- 
Gaussian.  However,  it  is  interesting  to  observe  that  for 
the  pure  Gaussian  case  (figure  3),  both  the  new  GLRT 
and  the  ASD  perform  almost  as  well  as  the  optimum 
MSD. 

One  last  question  to  be  resolved  is  the  degree  to 
which  the  new  GLRT  statistic  distribution  under  the 
null  hypotheses  is  affected  by  the  distribution  of  the 
low  rank  interference  component.  The  perturbation 
analysis  approximation  (11)  suggests  that  the  GLRT 
is  CFAR  to  at  least  first-order.  However,  the  analy¬ 
sis  ignores  any  higher  order  terms.  To  obtain  insight, 
we  computer  simulated  the  new  GLRT  using  the  pre¬ 
vious  non-Gaussian  and  Gaussian  example  for  20000 
independent  trials  for  the  null  hypotheses  only.  We 
then  calculated  the  empirical  cumulative  distribution 
function  of  the  test  statistic  and  used  it  to  determine 
the  threshold  to  achieve  a  given  pfa.  Figure  4  shows 
the  pfa  plotted  as  a  function  of  threshold  for  both  the 
non-Gaussian  and  Gaussian  cases.  As  can  be  seen  from 
figure  4,  the  pfas  are  very  close.  The  pfas  only  slightly 
deviate  as  the  threshold  increases,  implying  that  the 
new  GLRT  is  approximately  invariant  to  the  distribu¬ 
tion  of  the  low  rank  interference  component. 


5.  CONCLUSION 

We  have  derived  a  new  GLRT  detector  and  shown  it- 
s  relationship  to  the  UMPI  MSD.  Our  perturbation 


Figure  2:  Experimentally  measured  probability  of  de¬ 
tection  in  non-Gaussian  interference  at  a  pfa  of  .005 
based  on  15000  trials. 


analysis  and  numerical  examples  show  that  the  new 
GLRT  is  likely  to  be  much  more  robust  in  low  rank 
non-Gaussian  clutter  than  ad  hoc  or  conventional  adap¬ 
tive  detectors.  Finally,  further  work  needs  to  done  in 
analyzing  the  detectors  performance  in  regards  to  sig¬ 
nal  and  rank  mismatch  and  higher-order  effects  due  to 
the  non-Gaussianity  of  the  interference. 

APPENDIX  A:  PERTURBATION  ANALYSIS 

We  start  with  the  numerator  of  (9).  Recall  that  the  nu¬ 
merator  is  the  square-error  in  fitting  a  rank  rn  matrix 
to  Z.  Letting  Z  =  AB  +  N,  where  N  is  some  pertur¬ 
bation  and  using  the  first-order  subspace  perturbation 
expansion  derived  in  [9]  for  the  error  in  approximating 
a  matrix  by  a  matrix  of  lower  rank,  we  obtain 

min  || Z  —  J3A||^  ss  nufh  =  \\PgZP^Wp  (12) 
ByA 

where  P ^  —  I  —  A11  (AA,r)~l  A. 

We  now  approximate  the  denominator.  If  the  de¬ 
nominator  of  (9)  is  solved  with  respect  to  only  By  and 
A\  (holding  c  fixed),  it  is  equivalent  to  finding  the  rank 
r„  approximation  to  Z  -  [0|Sc].  Treating  [0|Sc]  as  a 
perturbation  (weak  signal  and  noise  case)  initially  and 
applying  (12),  we  can  approximate  the  denominator  as 

den  f»  min  \\Pg  ZPjy  -  cPg  [0|S]P4  ||^  (13) 

C 

The  minimization  of  (13)  is  a  standard  linear  least- 
squares  problem  and  the  residual  fitting  error  is 

den  «  \\Ps„2."\\2f  (14) 

where  z"  =  vec(PgZP^),  Pg„  =  I  —  Ps",  Ps"  = 
S"(S"//S")-1S",  and  S"  =  [vec{P&[0\S]P£)].  The 
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Figure  3:  Experimentally  measured  probability  of  de¬ 
tection  in  Gaussian  interference  at  a  pfa  of  .005  based 
on  15000  trials. 


Figure  4:  Experimentally  measured  probability  of  false 
alarm  plotted  as  a  function  of  threshold  for  both  the 
Gaussian  and  non-Gaussian  cases. 


operator  vec(-)  takes  a  matrix  and  converts  it  to  a  vec¬ 
tor  representation  by  stacking  the  columns.  Finally, 
replacing  the  exact  quantities  in  (9)  by  their  above  ap¬ 
proximations  (12)  and  (14),  and  after  some  simplifica¬ 
tion,  we  obtain 


y  »  1  + 


II  iV 

II  ps" z"  IIf 


num 

den 


(15) 
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ABSTRACT 

This  work  introduces  a  novel  auto-focusing  phase  aberra¬ 
tion  correction  algorithm  for  coherent  imaging  systems  such 
as  medical  ultrasound  and  synthetic  aperture  radar  (SAR). 
The  algorithm  follows  directly  from  the  analytical  expres¬ 
sions  obtained  from  a  detailed  theoretical  analysis  of  the 
scattering  of  wave  packets  in  a  random  scattering  medium 
consisting  of  many  scatterers  in  each  resolution  cell.  The  al¬ 
gorithm  selects  the  resolution  cell  in  a  range-azimuth  region 
with  the  smallest  ratio  of  the  variance  of  the  focused  chan¬ 
nel/element  amplitudes  to  the  mean  square  of  the  focused 
channel  amplitudes.  The  aberrant  phase  can  be  estimated 
directly  from  this  selected  cell  using  well  known  phase  gra¬ 
dient  techniques.  Monte  Carlo  simulations  are  shown  to 
exhibit  excellent  algorithmic  performance. 

1.  INTRODUCTION 

In  this  work  we  present  a  novel  auto-focusing  phase  aberra¬ 
tion  correction  algorithm  for  coherent  imaging  systems  such 
as  medical  ultrasound  and  synthetic  aperture  radar  (SAR). 
The  algorithm  follows  a  detailed  theoretical  analysis  of  the 
scattering  of  wave  packets  in  a  random  scattering  medium 
consisting  of  many  scatterers  in  each  resolution  cell.  Phase 
aberration  detection  and  correction  algorithms  have  been 
studied  extensively  in  the  literature[l-5].  Some  degree  of 
success  has  been  achieved  for  the  radar  imaging  applica¬ 
tions,  however  the  medical  ultrasound  algorithms  have  ex¬ 
hibited  limited  success  in  actual  in  vivo  ultrasound  experi¬ 
ments. 

The  propagation  velocity  of  the  signals  in  medical  ultra¬ 
sound  and  Synthetic  Aperture  Radar  differ  by  five  orders 
of  magnitude.  Nonetheless,  both  of  these  signals  satisfy  a 
similar  wave  equation,  and  there  are  significant  commonal¬ 
ities  in  the  physics  based  signal  processing  of  the  coherent 
imaging  processes.  High  resolution  images  are  of  great  im¬ 
portance  in  both  these  modalities.  In  medical  diagnostic 
ultrasound,  one  desires,  for  example,  to  distinguish  a  can¬ 
cerous  lesion  from  a  cyst.  High  resolution  surveillance  SAR 
systems  are  used  to  automatically  characterize  targets.  In 
both  modalities  phase  errors  in  the  image  formation  cause 
blurring  which  in  turn  reduces  the  efficacy  of  the  character¬ 
ization  algorithms. 


This  work  was  supported  by  NSF  under  grant  #CCR- 
9817630 


A  phased  array  coherent  imaging  system  is  focused  at 
a  particular  resolution  cell  by  adjusting  the  phases  received 
at  all  the  elements  of  the  array  such  that  an  interference 
maximum  occurs  for  the  coherent  signals  scattered  from  the 
focused  cell.  As  we  scan  throughout  the  image  space,  the 
strength  of  the  scattering  from  the  focused  cells  will  form  a 
brightness  image  of  the  scattering  field.  In  a  digital  coherent 
imaging  system,  the  received  signals  are  sampled  at  each  of 
the  elements  of  the  spatially  sampled  coherent  aperture. 
The  focused  image  map  of  a  resolution  cell  is  obtained  by 
selecting  the  time  samples  at  all  the  array  elements  such 
that  the  time  of  flight,  and  hence  the  phase,  of  the  coherent 
scattered  signals  from  the  resolution  cell  of  interest  will  be 
the  same. 

The  phase  in  corrupt  data  signals  is  a  combination  of 
the  necessary  good  phase  that  contains  the  geometrical  and 
scattering  phase  information  that  would  occur  in  a  non  cor¬ 
rupted  scattered  signal,  and  the  aberrant  bad  phase  that 
serves  to  blur  the  image.  The  problem  that  needs  to  be 
solved  is  to  develop  an  algorithmic  that  provide  some  means 
of  differentiating  the  good  from  bad  phase.  Once  this  dif¬ 
ferentiation  can  be  established,  the  bad  phase  can  be  esti¬ 
mated  and  selectively  pruned  out  of  the  signals.  The  cleaned 
signals  will  retain  enough  of  the  good  phase  so  the  net  effect 
of  the  cleaning  procedure  on  the  image  quality  is  positive. 

As  stated  above,  focusing  in  digital  medical  ultrasound 
sound  systems  is  accomplished  by  choosing  the  delays  ac¬ 
cording  to  estimates  propagation  times  based  upon  esti¬ 
mated  sound  velocities.  Human  tissue  has  sub-cutaneous 
layers  of  fatty  tissue  that  are  not  observable  from  the  sur¬ 
face.  The  sound  velocity  difference  between  the  fatty  and 
muscle  tissue  can  cause  a  significant  error  in  the  time  delay 
focusing  which  in  turn  causes  phase  errors  and  an  associated 
blurring  of  the  image. 

In  SAR  the  coherent  aperture  is  synthetically  gener¬ 
ated  as  a  T/R  physical  antenna  is  flown  over  an  extended 
flight  path.  Coherent  pulses  are  transmitted  and  the  scat¬ 
tered  signals  are  subsequently  received  at  points  in  the  flight 
path  separated  by  the  velocity  of  the  vehicle  divided  by  the 
coherent  pulse  repetition  frequency.  In  the  SAR  scenario 
we  are  dealing  with,  for  example,  X-band  radars  with  3 
cm  wavelengths.  Small,  UN-compensated  deviations  in  the 
flight  path  amounting  to  a  fraction  of  a  wavelength  over 
flight  paths  of  the  order  of  a  kilometer  can  cause  significant 
blurring.  Here  again,  we  can  apply  a  slight  variation  of  the 
physics  based  signal  processing  algorithm  that  we  will  de- 
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tail  for  ultrasound  to  blindly  estimate  and  correct  for  the 
SAR  phase  aberrations. 

The  enabling  mechanism  that  makes  this  algorithm  per¬ 
form  so  well  follows  directly  from  the  mathematical  analysis 
of  the  scattering  relations.  We  have  shown  that  the  variance 
of  the  time  selected-focused  elemental(channel)  signals  de¬ 
pends  strongly  upon  the  azimuthal  distribution  of  the  scat¬ 
tered  in  the  focused  resolution/scan  cell.  If  the  resolution 
cell  scatterers  are  narrowly  distributed  in  azimuth  around 
the  focused  angle,  the  variance  of  the  elemental  signal  am¬ 
plitudes  across  the  array  will  be  small.  The  algorithm  there¬ 
fore  searches  for  the  resolution  cell  with  the  smallest  vari¬ 
ance.  It  is  important  to  normalize  the  variance  by  divid¬ 
ing  by  the  square  of  the  mean  of  the  channel  amplitudes. 
This  normalization  scales  out  the  effects  of  overall  bright¬ 
ness  from  the  measure,  as  possible  errors  in  the  Minimum 
Variance  Scan  Cell  (MVSC)  assignment  could  be  made  for 
a  very  dim  speckle  in  the  absence  of  the  normalization.  As 
the  part  of  the  good  phase  that  is  element  dependent  will  be 
very  small  for  the  MVSC,  we  can  effectively  assume  that  the 
good  phase  has  been  stripped  off  the  channel  signals  asso¬ 
ciated  with  the  MVSC.  The  estimate  of  the  remaining  bad, 
channel-dependent-phase  is  made  using  well  known  phase 
gradient  techniques. 

In  the  subsequent  sections  of  this  paper  we  restrict  dis¬ 
cussion  to  the  ultrasound  application. 


is  neglected  over  a  range  segment  and  the  aberrant  phase 
estimated  for  a  single  range  cell  is  used  for  all  the  range 
cells  in  the  segment.  Our  examples  assume: 

<t>ki{n)  «  4>k(n),  in  azimuthal  segment. 

The  transmit  foci  coordinates  Rft{£)  are  characterized  by 
a  range,  R/t,  and  azimuthal  bearing  angle, 

/«/«(<)  5  sin M e)=?n  RJtW.  (3) 

T n t 

The  origin  of  the  coordinate  system  is  set  at  the  center  of 
the  array.  The  sample  scan  angles  are  represented  by 

Rft(l)  =  tio  +  [t-  (Nt  +  l)/2]A„scan;  (4) 

Here  Ni  is  the  number  of  B  scan  lines  in  the  B  scan  seg¬ 
ment.  and  /zo  is  the  central  azimuth  of  the  azimuthal  scan. 
The  azimuthal  partition,  A^scan  is  typically  taken  to  be  a 
fraction  ~  1/4  to  1/2  of  the  Rayleigh  angular  resolution  of 
the  array,  sin(Ac/L).  Here  Ac  is  the  central  wavelength  of 
the  transmitted  ultrasound  pulse,  and  L  is  the  length  of  the 
coherent  aperture  of  the  ultrasonic  probe. 

The  B  scan  receive  foci  have  the  vector  coordinates, 
Rfr(k,l),  with  magnitudes,  R/r{k),  and  angle,  /z/r( l)  = 
For  notational  simplicity  we  define 


2.  MODEL  OF  SCATTERED  SIGNALS 

In  a  ultrasound  polar  B  scan  the  transmitted  signals  asso¬ 
ciated  with  a  single  firing  of  the  elements  of  the  array  are 
individually  time  delayed  to  produce  interference  maxima 
at  transmit  foci  located  at  specific  ranges  and  azimuthal 
scan  angles.  On  receive,  the  echo  time-sampled  coherent 
signals  are  first  accumulated  at  each  element.  The  focus- 
on-receive  signals  are  then  constructed  from  the  coherent 
sum  of  the  individual  elemental  time  samples  that  are  cho¬ 
sen  according  to  their  propagation  times  from  the  receive 
foci  to  the  individual  elements.  The  B  scan  receive  foci  are 
at  the  same  azimuthal  angles  as  the  transmit  foci,  but  at 
different  ranges.  The  received-sampled  B  scan  elemental 
signals  have  three  indices,  an  element  index,  n,  a  range  in¬ 
dex,  k,  and  an  azimuthal  scan  index  l.  Here  the  number 
of  elements(channels)  is  given  by  Na.  The  corrupted  signal 
can  be  generally  represented  by, 

s€{n,k,e)  =  e^fc'(n)s(n,M)-  (1) 

The  auto-focusing  method  seeks  to  obtain  an  estimate  of  the 
aberrant  phases,  <f>kt{n).  Then,  an  estimate  of  the  cleaned 
signal  is  constructed  by  equalizing  the  effects  of  the  aberrant 
phase, 

sc(n,k,t)  =  e~^kt>'n'>  Si{n,k,t).  (2) 

The  model  can  be  simplified  by  dividing  the  image  map 
up  into  azimuthal  and/or  range  segments.  One  can  assume 
that  the  aberrant  phases  are  weakly  dependent  upon  ei¬ 
ther  the  range  or  scan  angle  indices  within  the  sector  do¬ 
mains.  In  the  examples  described  herein  we  retain  a  range 
dependence  on  the  aberrant  phase  as  we  will  process  all  the 
range  cells  individually.  In  an  alternative  procedure  which 
is  computationally  less  intensive,  the  range  cell  dependence 


A*/(  0  —  P/rCO  —  A*/ 1  (^)  - 

The  range  cell  partition  of  the  receive  foci  is  represented  as 

Rfr(k)  =  Rft  +  A (fc);  A(fc)  d±f  [(k-  1)  -  ( kmax  -  1)/2]Ak; 

(5) 

for  k  =  1,2,. . . , kmax-  Typical  values  for  the  range  parti¬ 
tion,  Ar,  are  taken  to  be  ~  1/4  to  1/2  of  range  resolution 
associated  with  the  bandwidth,  c3/{2B).  Here  cs  is  the 
speed  of  sound,  and  B  is  the  bandwidth  of  the  ultrasonic 
pulse. 

A  general  formulation  of  the  scattering  of  ultrasound 
can  be  very  complex  if  one  tried  to  model  multiple  scat¬ 
tering  effects.  The  scattering  equations  become  tractable 
when  the  Born  approximation  is  invoked.  The  Born  model 
considers  the  effects  of  single  scattering  events  only.  Here 
the  received  signal  just  prior  to  sampling  at  the  receiving  el¬ 
ement  at  at  rn  that  is  scattered  from  a  point  scatterer  at  fs 
after  being  initially  transmitted  from  a  transducer  centered 
at  rm  is  modeled  as 


S{fl,fs,fm,t) 


As  a(f3,fm)a(fs,fe) 
{An)2  In,  -  rm||rs  -  ft\ 


/ 


eJ^(*-[|r,-fm|  +  |ra— fd]/c3)  fC2(—)  — 

2n 


(6) 


Here  Kq{u)  is  the  Fourier  transform  of  the  impulse  response 
of  the  transmitter/receiver  elements,  and  a(rs,fn)  repre¬ 
sents  the  transducer  element  factor.  For  the  considerations 
presented  herein  we  use  a  simple  cosine  obliquity  factor  for 
the  element  factor, 


a{f3,fn) 


def  {fs  -  rn)  ■  n± 

|FS  -  fn  | 


(7) 
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The  mathematics  formulation  of  the  propagation  and 
scattering  of  ultrasonic  signals  of  the  type  used  in  medi¬ 
cal  diagnostic  applications  can  be  simplified  using  a  spher¬ 
ical  Gaussian  model  for  the  propagating  wavepackets.  A 
Gaussian  wave  form  is  particularly  simple  to  work  with  an¬ 
alytically  as  the  Fourier  transform  of  a  Gaussian  is  also  a 
Gaussian.  Consider  a  normalized  Gaussian  pulse  fCo(t)  that 
is  centered  at  the  transducer  at  the  time  t  =  0.  The  pulse 
and  its  Fourier  transform  Ko(cu)  are  represented  by, 


Ko(t)  = 


Kq(l>)  =  e  (55*) 


(8) 


Here  the  bandwidth  B  =  1/r.  The  fractional  bandwidths1 
of  interest  fall  into  a  broad  range  from  ~  20  —  80%.  Using 
a  Gaussian  pulse  form,  the  Fourier  integral  on  the  right- 
hand-side  of  Eq.  (5)  can  be  readily  evaluated, 


/■ 


K$(u>) 


dui 

2tz 


(9) 


&  c3“’cTret(n,s,m)c-B2T?,l(n,alm)/4 


We  have  used  an  abbreviated  notation  for  the  retarded  time, 


Tret(n,a,m)  =  t  -  t0m  -  [|fa  -  f„|  -  \fs  -  rm|]/c3,  (10) 


where  tom  is  the  initiation  time  of  the  pulse  at  the  mth 
transmitting  element.  Summing  over  all  scatterers,  we  have 


s(rn,  Tr n,t) 


3  a(f3,fn)a(fa,rm)  .. 
C2^A'\f3  _r-n||r3-Vm| 


(11) 


e}vcTret(n,s,m)e~B2T?'t(n,s,m)/4 


Here  C  is  a  generic  constant  that  captures  all  the  constant 
coefficients  and  factors  of  7r  that  arise  in  the  process. 

In  time  delay  focusing  the  time  delays  of  the  transmit¬ 
ted  signals  in  a  single  firing  of  the  array  cause  all  the  coher¬ 
ent  pulses  to  arrive  at  the  transmit  focus  at  the  same  time 
represented  by  </e(f), 

t/tW  =  {tom +  \RjtW  -  rm\/cs},  for  all  m.  (12) 


The  samples  of  the  scattered  received  signals  are  then  cho¬ 
sen  so  that  their  sample  delay  times  corresponds  to  an  in¬ 
terference  maximum  from  a  scatterer  located  at  the  receive 
focus  Rfr(k,£).  This  corresponds  to  the  receive  time  sam¬ 
ples, 

tn{k,  £)  =  tft{£)  +  [\R}t  (£)  -  in  I  +  |  Rfr(k,  £)  -  R;t{£)\]/Cs, 

(13) 

at  the  nth  receiver.  The  retarded  times  corresponding  to 
the  B  scan  scattered  signals  that  are  focused  on  both  trans¬ 
mit  and  receive  are, 


T/retiji)  s,  rn,  fc,  f)  — 

Cs 


A(k)  +  \Rft(e)-rm\  + 
\Rfr(k,£)  -  r„|  -  |rs  -  r„|  -  |rs  -  rm|j.  (14) 


1For  a  Gaussian  pulse  the  bandwidth  is  defined  here  as  twice 
the  3dB  width  of  Kq(u).  The  fractional  bandwidth  F,  and  band¬ 
width  B  have  the  functional  relationship,  B  =  ^==Fwc 


Here  the  {rs}  are  the  coordinates  of  the  scatterers.  The 
received  signals  that  are  focused  at  R/r(k,£)  are 

s{n,k,£)  =  ^  Asj(n,  s,k,£) ^jr-n-’  r_3  j  x  (15) 

|fn  —  rs| 

The  function  7 (n,  s,  k,  £), 

7 (n,s,M)  =  Ge^A(t)/Cj  x 

'  I  rm  —  r3| 

m 

e-B‘2TJr.ct(n,s,m,k,l)ejuic[\Rft(.l)-Tm\-\ra-rm\]/ca 

will  be  a  relatively  slowly  varying  function  of  the  receiver 
indices  {n}.  From  Eqs.  (15,16)  we  see  that  the  major  con¬ 
tributions  to  the  scattered  signals  will  come  from  scatterers 
that  are  in  close  proximity  to  the  received  focus  correspond¬ 
ing  to  values  of  Tjret{n,s,m,k,£)  <  4 /B2.  A  dominant 
scatterer  will  manifest  itself  as  a  maximum  in  the  B  scan 
at  a  bearing  angle  in  the  neighborhood  of  the  central  bear¬ 
ing  angle  of  the  scattering  cluster. 


3.  MVSC  ALGORITHM 


The  first  step  in  the  algorithmic  procedure  is  an  Amplitude 
equalization  to  mitigate  possible  distortion  effects  caused  by 
transducer  obliquity  factors  and  the  1  /R  dependence  of  the 
received  signals  in  the  near  field. 


Step  1.  Amplitude  equalization  -  the  received  signal  am¬ 
plitudes  are  equalized  as  follows: 


3e(n,M)d=  f^sc(n,k,£). 

a(Rfr(k,e),rn) 


(17) 


Here  se(n,k,£)  are  the  corrupt  data  modeled  by  Eq.  (1). 
Defining  7  as  the  amplitude  equalized  form  of  7,  the  indi¬ 
vidual  amplitude  equalized  corrupt  signals  are  of  the  form, 


St{n,k,£)  =  ^7  (n,s,k,£)e3u,c[{Rf' 


(18) 

We  want  to  estimate  the  aberrant  phase,  <pk{n),  given  the 
measured  corrupt  equalized  signal  Jc(n,k,£). 


Step  2.  The  second  step  accumulates  and  stores  the  ratio 
of  the  variance  to  square  of  the  mean  of  the  equalized  el¬ 
emental  channel  signals  associated  with  each  of  the  range- 
azimuth  cells  in  the  scan.  This  parameter  represents  the 
normalized  cell  variance  (NV)  metric.  The  NV  can  be 
mathematically  represented  by 


G(fc,p/W)  = 


\st{n,k,£) |2 

En=l  |Se(ra>M)| 


-  1. 


(19) 


The  next  processing  stage  sorts  the  NV’s,  and  identify  the 
specific  cell  with  the  smallest  NV  within  the  sector  scan. 
This  cell  is  the  minimum  variance  scan  cell  (MVSC).  The 
algorithm  then  performs  the  computations  that  estimate 
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the  aberrant  phase  from  the  complex  channel  signals  asso¬ 
ciated  with  the  MVSC. 

The  estimation  of  the  phase  can  be  simply  understood 
by  examining  the  approximate  functional  form  of  the  signals 
in  the  Fresnel  zone  of  the  array.  Here  we  have, 

«e(n,  k,£)  =  e^(n)]Tr(s,M)  x  (20) 

S 

g- (fl/  (e)-ll,  )[l+r„  (ft/ (<)+/!„  )/(2B/,.(fc))] 

Therefore  the  channel  amplitude  will  have  the  approximate 
form. 

|  ^  £)g-iWcrn(/»/(<)-/*»)[l+>-n(M/(t)+e«)/(2«/r(*))] 

(21) 

The  variance  of  the  channel  signals  for  the  focused  pixel 
depends  strongly  upon  the  azimuthal  distribution  of  the 
scatterers  in  the  focused  pixel.  If  the  pixel  scatterers  are 
narrowly  distributed  in  azimuth  around  p  {((.),  the  variance 
of  the  channel  signal  amplitudes  across  the  array  will  be 
small.  This  metric  further  normalizes  the  variance  by  di¬ 
viding  by  the  square  of  the  mean  of  the  channel  amplitudes. 
This  normalization  scales  out  the  effects  of  overall  bright¬ 
ness  from  the  measure,  as  possible  errors  in  the  MVP  assign¬ 
ment  could  occur  for  an  inappropriate,  very  dim  speckle,  in 
the  absence  of  the  normalization.  As  the  part  of  the  good 
phase  that  is  channel  dependent,  will  be  very  small  for  the 
MVSC,  we  can  effectively  assume  that  the  good  phase  has 
been  stripped  off  the  channel  signals  associated  with  the 
MVSC,  i.e., 

e3“ern(n/(<)-**.)[l+r„0i/(<)+M«)/(2n/r(fc))]/c»  ^  j  (22) 

The  estimate  of  the  remaining  bad,  channel  dependent, 
phase  is  made  using  a  phase  gradient  technique  that  is  well 
known  in  the  art. 

Step  3.  Estimation  of  aberrant  phase  -  Now  that  the  esti¬ 
mate  of  the  good  phase  has  been  stripped  off  the  signal,  the 
remaining  bad  channel  dependent  phase  can  be  estimated 
using  a  phase  gradient  technique, 

A0fc(n)  =  Z^s*(n  -  l,A:,4iin)s<(n,  fc,4nin)).  (23) 

The  estimate  the  phase  aberration  across  the  aperture  is 
obtained  by  integrating  the  estimated  gradient, 

n 

4>k{n)  =  y]  A 4>k{q)-  (24) 

9=2 


4.  SIMULATIONS 


Simulation  model  parameters 

U  =  =  5  x  10 6 Hz. 

cs  =  1.54  x  103  m/sec 
Ac  =  3.08  x  10~3m  -  • 

A  fc  =  ,6/c  Hz 
d  =  Ac/2 

B  =  .8507T dfc  radians/sec  - 
iVa  =  64 

L  =  (Na  —  l)Ac/2 
R/t  =  1.5L 
//  =  1-5 
Vt  =  1 


pulse  central  frequency 
sound  velocity 
'pulse  central  wavelength 
modulation  bandwidth 
trans/receiver  element  spacing 
1/2  mod  bandwidth  @  3db. 

#  T /R  elements 
array  length 
transmit  focus  distance, 
f  #  of  scattering  cell 
scat,  strength 


We  simulate  the  B  scan  scattering  pattern  for  a  ran¬ 
domly  generated  distribution  of  point  scatterers  with  ap¬ 
proximately  50  scatterers  in  each  of  the  range/azimuthal 
resolution  cells.  This  is  a  sufficient  number  to  generate 
a  fully  developed  speckle  pattern.  We  have  also  included 
a  void  region  of  azimuthal  width  corresponding  to  ~  1.5 
Rayleigh  azimuthal  resolution  units.  Two  different  random 
distributions  of  scatterers  with  voids  are  are  illustrated  in 
Figs.  2,3.  The  dotted  rectangles  in  the  figures  illustrate  the 
position  resolution  cell  associated  with  MVSC  cell  for  these 
phantoms. 

We  now  impose  a  random  aberration  phase.  The  aber¬ 
ration  phase  is  constructed  as  follows.  Take  a  sequence  Nr 
of  uniformly  distributed  random  numbers  over  the  interval 
(—0.5  0.5).  Take  FFT,  low  pass  filter  by  using  the  first  kc 
values  setting  the  rest  of  the  transform  coefficients  equal  to 
zero.  Then,  take  the  IFFT  and  scale  the  results  so  the  re¬ 
sultant  phases  amplitudes  6  ±rr  This  procedure  introduces 
an  element  to  element  correlation  length  of  the  order  of 
{Nr/kcu) A/2  into  the  random  aberrant  phase. 

Figs.  3,4  illustrate  the  results  of  the  first, second,  and 
third  iterations  of  our  methodology  of  blindly  estimating 
and  cleaning  the  results  of  the  aberrant  phase  for  the  respec¬ 
tive  distributions  shown  in  Figs.  1,2.  In  each  of  the  exam¬ 
ples  20  independent  random  trials  of  the  phase  aberration 
and  subsequent  cleaning  process  are  illustrated.  The  results 
are  shown  for  aberrant  phase  with  a  correlation  length  of 
1.38mm.  Simulation  results  over  a  broad  range  of  aber¬ 
rant  phase  correlation  lengths  demonstrate  that  our  auto- 
focusing  algorithm  is  effective  down  to  correlation  lengths 
~  4  wavelengths  for  phase  aberrations  varying  between  ±n 
over  the  length  of  the  array. 
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Step  4.  Construct  estimate  of  corrected  data  -  Go  back  to 
the  original  corrupt  data  set  and  strip  off  the  estimate  of 
the  aberrant  phase. 

s(n,k,e)  =  e-j^Mse(n,k,e).  (25) 

Step  5.  Subsequent  iterations 

The  estimation  of  the  aberrant  phase  can  further  improved 
by  going  back  to  step  1.  for  additional  iterations  of  the 
algorithm. 
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DISTRIBUTION  OF  SCATTERERS  -  AZIMUTHAL  B  MODE  SCAN 


ANGLE  —  DEGREES 


ANGLE  —  DEGREES 


ANGLE  —  DEGREES 


ANGLE  — DEGREES 


Figure  3:  Simulation  results  for  the  cleaned  images  using 
NVC  metric  for  phantom  in  Fig.  1. 


Figure  1:  Distribution  of  scatterers  used  in  simulations 


Figure  2:  Distribution  of  scatterers  used  in  simulations 


RANDOM  ABERRANT  PHASES 


ANGLE  —  DEGREES 


ANGLE  —  DEGREES 


Figure  4:  Simulation  results  for  the  cleaned  images  using 
NVC  metric  for  phantom  in  Fig.  2. 
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ABSTRACT 

There  is  current  debate  in  the  radar  community  whether 
sea  clutter  is  stochastic  or  chaotic.  In  this  paper,  a 
stochastic  k-distributed  surrogate  is  generated  for  a 
typical  sea  clutter  data  set.  The  k-distributed  set  was 
then  analysed  using  the  methods  recently  applied  to  sea 
clutter  by  Haykin  et  al.  The  k-distributed  set  is  shown 
to  have  DML  and  FNN  values  in  the  same  range  as 
reported  by  Haykin  et  al.  and  with  positive  and  neg¬ 
ative  Lyapunov  exponents.  In  addition,  various  white 
and  correlated  noise  distributed  sets  are  analysed  in  the 
same  way  and  found  to  produce  similar  artefact.  It  is 
concluded  that  these  chaotic  invariants  cannot  be  used 
to  distinguish  between  chaotic  and  stochastic  timeser- 
ies  and  are  redundant  in  an  application,  such  as  radar 
sea  clutter,  where  the  time  series  is  unknown  and  could 
be  of  a  stochastic  nature. 

1.  INTRODUCTION 

There  is  current  debate  in  the  radar  community  whether 
sea  clutter  is  stochastic  or  chaotic.  Conventially,  high 
resolution  radar  sea  clutter  has  been  modelled  by  a 
stochastic  compound  k-distribution[l].  Recently,  Haykin 
et  al.[2][3]  has  performed  a  nonlinear  analysis  on  sea 
clutter  data  sets  and  claims  that  sea  clutter  is  a  chaotic 
process.  This  nonlinear  analysis  hinges  around  two 
main  chaotic  invariants.  These  are  the  ’maximum  like¬ 
lihood  estimation  of  the  correlation  dimension’  (DML 
value) [4]  and  ’false  nearest  neighbours’  (FNN) [5].  The 
Lyapunov  exponents,  are  also  measured,  where  the  num- 

This  work  was  supported  by  BAE  Systems,  DERA  Malvern, 
EPSRC  and  the  Royal  Society.  Sea  clutter  data  was  provided 
courtesy  of  DERA,  Malvern. 


ber  of  exponents  to  be  measured  is  determined  from  the 
FNN  calculation. 

In  [3]  it  was  reported  that  sea  clutter  had  fractional 
DML  values  in  the  range  4. 1-4.5  and  FNN  global  di¬ 
mension  in  the  range  5-6.  It  could  be  inferred  from 
these  results  that  the  system  is  low  dimensional  and 
fractal  which  is  symptomatic  of  chaos.  From  the  FNN 
result  reported  in  [3],  5-6  Lyapunov  exponents  were 
measured  which  gave  positive  and  negative  values,  where 
one  positive  and  negative  exponent  signifies  chaos.  In 
this  paper  we  wish  to  test  the  robustness  of  the  above 
mentioned  chaotic  invariants  to  stochastic  time  series 
and  in  particular  a  time  series  drawn  from  a  k-distribution. 
If  the  chaotic  invariants  are  robust  they  will  be  able  to 
distinguish  a  stochastic  time  series  from  a  chaotic  one. 

The  paper  is  structured  as  follows. 

•  DML  and  FNN  analysis  for  white  stochastic  time 
series; 

•  DML  and  FNN  analysis  for  correlated  Gaussian 
noise; 

•  Mutual  information,  autocorrelation,  DML,  FNN 
&  Lyapunov  exponents  for  a  k-distributed  surrog¬ 
ate. 

2.  WHITE  NOISE  SIGNALS 

White  noise  signals  are  essentially  high  dimensional 
in  the  sense  that  high  DML  values  and  a  high  FNN 
global  dimension  are  to  be  expected  for  such  a  series. 
Four  white  stochastic  systems  were  generated.  The  sig¬ 
nals  generated  were  gamma,  uniform,  Gaussian  and 
k-distributed.  Each  consisted  of  50,000  data  points 
which  is  equivalent  to  the  length  of  the  data  record 
in  [3].  The  correlation  dimension  ( £>2 )  and  ’maximum 
likelihood  estimate  of  the  correlation  dimension’  (DML 
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value)  which  is  a  noise  robust  version  of  D2  were  estim¬ 
ated  using  a  method  by  Schouten  et  al.[4]  which  was 
employed  in  [3] .  The  results  are  shown  in  Table  1 , 


Data  set 

D2 

DML 

Gamma  (white) 

1.38 

1.38 

Uniform  (white) 

4.20 

4.20 

Gaussian  (white) 

4.22 

4.42 

K  (white) 

1.94 

1.94 

Table  1 


The  results  are  somewhat  alarming.  In  the  case  of  the 
original  method  of  Grassberger  and  Procaccia[6]  the 
£>2  value  would  become  infinite  for  white  noise  since 
the  noise  fills  high  dimensional  space  resulting  in  a 
very  large  Di-  Schouten’s  method  does  not  demon¬ 
strate  this  divergence,  instead  it  suggests  the  data  has 
a  low  fractal  dimension  which  might  be  interpreted  as 
the  presence  of  chaos.  Even  more  worrying  is  that  for 
the  uniform  and  Gaussian  white  noise  signals  the  Di 
and  DML  values  are  in  the  same  range  as  in  [3]  which 
were  measured  for  sea  clutter.  The  FNN  results  for  the 
parameter  Rtol  =  10,  as  in  [3],  are  shown  in  Figure 
1.  The  algorithm  designed  by  Kennel  [7]  was  used  to 
measure  the  FNN.  An  embedding  delay  of  unity  was 
used  for  the  white  stochastic  sets. 

False  Nearest  Neighbour  Analysis 
(for  White  noise  sets) 


Figure  1: 

The  FNN  for  the  gamma  distribution  would  be  expec¬ 
ted  result  for  a  typical  white  noise  system  (i.e,  mono- 
tonically  decreasing).  The  white  K  distribution  ac¬ 
tually  saturates  at  a  global  dimension  of  11  which  is 
maybe  also  acceptable  for  a  noise  system  of  50,000 
points.  However,  the  FNN  for  the  uniform  and  Gaus¬ 
sian  white  noise  systems  drop  down  to  a  saturation 
level  at  a  global  dimension  of  5  which  is  the  same  as 
reported  in  [3]  for  sea  clutter.  From  Abarbanel[5]  this 


result  could  be  misconstrued  as  a  low  dimensional  at¬ 
tractor  with  observational  noise.  Evidently  the  FNN’s 
also  generate  artifact  for  white  stochastic  time  series. 

3.  CORRELATED  GAUSSIAN  NOISE 

What  now  follows  is  the  same  analysis  applied  to  cor¬ 
related  stochastic  time  series.  For  this  analysis,  50,000 
point  data  records  of  correlated  Gaussian  noise  for  cor¬ 
relation  coefficient  p=  0.1,  0.3,  0.5,  0.7,  0.9,  0.99  and 
0.999  were  generated.  The  D%  and  DML  values  to¬ 
gether  with  the  vector  length (m)  are  shown  in  Table 
2. 


P 

d2 

DML 

Vector  Length(m) 

0.1 

4.72 

5.42 

5 

0.3 

4.72 

5.42 

5 

0.5 

4.82 

6.27 

6 

0.7 

5.37 

7.38 

8 

0.9 

4.98 

8.49 

14 

0.99 

3.56 

6.59 

46 

0.999 

2.28 

3.49 

142 

Table  2 


As  the  correlation  coefficient,  p,  increases  so  does  the 
£>2  and  DML  value.  This  continues  to  a  ceiling  at 
p=0.7-0.9  and  then  decreases  at  high  correlation  val¬ 
ues.  The  vector  length(m)  is  also  presented.  Essen¬ 
tially  from  Schouten’s  work[4],  random  pairs  of  vectors 
are  chosen  of  a  particular  vector  length  (m).  And  the 
maximum  norm  distance  of  10,000  pairs  of  vectors  is 
plotted  as  a  cumulative  histogram  from  which  £>2  and 
DML  are  estimated  from.  It  can  be  seen  that  the  vec¬ 
tor  length  increases  with  the  degree  of  correlation  of 
the  time  series.  It  is  crucial  that  m  is  estimated  cor¬ 
rectly.  The  vector  length  is  inversely  proportional  to 
and  derived  from  the  number  of  crossings  of  the  mean 
of  the  time  series,  (i.e.  the  larger  the  no.  of  crossings 
the  smaller  m  will  be).  An  assumption  has  been  made 
here  which  is  that  there  is  some  structure/repetition 
in  the  motion  of  the  system  (i.e.  there  exists  orbits 
of  an  attractor).  In  white  stochastic  systems  where 
there  is  no  structure  artifact  must  occur.  A  mean  level 
will  still  exist  and  be  crossed  very  frequently.  However, 
these  crossings  are  not  structured,  as  in  the  orbits  of  an 
attractor,  but  are  purely  random.  Therefore,  a  low  m 
will  be  determined  and  a  spurious  estimate  of  £>2  and 
DML  will  be  made.  As  the  correlation  of  the  noise  is  in¬ 
creased  more  apparent  structure  appears  since  points  in 
the  time  series  become  more  dependent  on  the  sample 
behind.  Less  crossings  of  the  mean  will  occur  and  a 
larger  m  will  be  measured.  This  is  evident  in  the  Table 
2.  Therefore,  for  correlated  noise  signals  the  D2  and 
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DML  values  of  [5]  suggests  low  fractal  dimensionality 
which  could  be  mistaken  as  evidence  for  chaos.  In  or¬ 
der  for  Schouten’s  et  al.  method  to  work  for  stochastic 
systems  the  vector  length  (m)  must  be  calculated  in  an¬ 
other  manner  which  is  able  to  distinguish  in  some  way  if 
the  system  has  true  orbits  occurring  or  not.  The  FNN 
results  for  the  same  correlated  Gaussian  time  series  are 
shown  in  Figure  2  for  an  Rtol  =  10  and  embedding 
delay  of  unity. 


False  Nearest  Neighbour  Analysis 
(for  Correlated  Gaussian  noise  sets) 


Figure  2: 

At  small  correlation  values  of  p  the  system  resembles 
a  low  order  5  system  with  observational  noise  on  top. 
As  the  correlation  gets  larger  it  appears  as  if  the  sys¬ 
tem  has  real  dynamics.  Therefore,  the  FNN  seems  to 
produce  more  artifact  for  heavily  correlated  stochastic 
time  series. 


The  radar  parameters  of  the  actual  sea  clutter  set  are 
given  in  Table  3.  The  shape  and  scale  parameters  of 
the  sea  clutter  were  23  and  661  respectively.  A  3  x  106 
compound  K-distributed  surrogate  data  set  was  gen¬ 
erated  .  This  was  achieved  using  Tough  and  Ward’s 
method [8].  This  technique  enables  the  generation  of 
surrogate  data  sets  which  are  matched  in  both  point 
probability  density  function  and  in  autocorrelation  se¬ 
quence  to  the  observed  sea  clutter.  Figure  3.  shows 
the  time  series  for  both  the  original  sea  clutter  set  and 
of  the  generated  compound  K-distributed  set. 


Tlmeserles  of  the  seaclutter  set 
and  the  generated  compound  K-distributed  set 
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Figure  3: 

The  autocorrelation  function(ACF)  and  mutual  inform¬ 
ation  (MI)  [5]  plots  of  the  surrogate  data  were  measured 
using  50,000  data  points  and  are  shown  in  Figure  4. 


4.  A  K-DISTRIBUTED  TIME  SERIES 


What  follows  is  a  time  series  analysis,  using  the  tech¬ 
niques  and  parameters  as  described  in  [3] ,  for  a  surrog¬ 
ate  compound  K-distributed  time  series. 


Parameter 

Description/ value 

Frequency 

3GHz  1 

Pulse  compression 

not  used  j 

Pulse  width 

1  ps 

Resolution 

150m 

Windspeeds 

12.8m/s  (VV) 

Sea  states 

“strong  breeze”  (VV) 
(Beaufort  scale:  6)  1 

Polarisation/channel 

VV  (Agility  not  used)/  Q 

PRF 

20kHz  j 

Grazing  angle 

0.12° 

Beamwidth 

6° 

Mutual  Information  &  Autocorrelation 


Table  3  Figure  4: 
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The  MI  and  ACF  curves  were  found  to  be  smooth.  A 
zero  crossing  of  the  ACF=46  and  1st  minima  of  the 
MI=49  were  found  to  occur  at  roughly  the  same  posi¬ 
tion.  The  £>2  and  DML  values  were  measured  for  the 
50,000  point  set,  the  3  x  106  point  set  and  also  for  100 
sets  of  30,000  points  in  order  to  obtain  a  range.  10,000 
vectors,  specified  by  Schouten,  were  used  for  the  50,000 
and  30,000  point  data  sets  in  order  to  estimate  the  D2 
and  DML  values.  This  number  of  vectors  used  was  ap¬ 
propriately  scaled  for  the  3  million  point  set  to  600,000 
vectors.  The  results  are  shown  in  Table  4. 


Data  set 

£>2 

DML 

50,000pts. 

3.27 

4.62 

3  million  pts. 

3.78 

4.55 

100  sets  of 
30,000pts. 

3.11  <  £>2  <  4.03 

4.20  <  DML  <  5.1 

Table  4 


Hence,  the  range  of  D2  and  DML  values  are  similar  to 
the  range  that  was  reported  for  a  number  of  clutter  sets 
in  [3]. 

The  FNN  results  are  shown  in  Figure  5  for  both 
the  Rtol  =  10  and  Rtol  =  25  that  is  used  in  [3]. 
An  embedding  delay=MI  =49  was  chosen.  The  same 
results  as  reported  in  [3]  were  found(i.e.  No  noise  floor 
is  present  and  a  global  dimension  of  5) . 

False  Nearest  Neighbours  Analysis  (Embed.delay=MI=49)  ^=2.0) 

(for :  T/W  vert.  generated  50,000pt  K-dlat,  ahape*23,  acale=e61) 


Figure  5: 


Finally  the  Lyapunov  exponents  were  measured  using 
the  result  of  the  FNN  calculation,  as  performed  in  [3], 
to  determine  the  number  of  Lyapunov  exponents  to 
use.  The  algorithm  by  Ushaw[9]  was  used  to  meas¬ 
ure  the  exponents.  The  algorithm  is  an  extension  of 
the  Darbyshire  and  Broomhead[10]  model.  The  Ushaw 
model  reverts  to  the  Darbyshire  and  Broomhead  model 
when  the  ’number  of  B  vectors  in  the  average’  is  set  to 


unity.  At  higher  values  of  this  parameter  local  noise 
reduction  takes  place.  Parameters  for  the  Lyapunov 
calculation  were  then  determined.  They  were  found  to 
be  (svd  window  length=120,  calculation  period  =1000, 
global  embedding  dimension=5,  local  embedding  di- 
mension=5,  no. steps  between  re-initialisations=40,  no 
of  B  vectors=60).  Two  calculations  were  made.  The 
first  with  (number  of  B  vectors  in  the  average=l  i.e 
Darbyshire  and  Broomhead  model)  shown  in  Table  5. 


Lya.Exp. 

nats/sample 

LI 

+0.001112 

L2 

»I— 1 

L3 

-0.005450 

L4 

-0.010183 

L5 

-0.025041 

Table  5 


The  second  calculation  was  made  with  noise  reduc¬ 
tion  (number  of  B  vectors  in  the  average=7)  shown  be¬ 
low  in  Table  6. 


Lya.Exp. 

nats/sample 

LI 

+0.002018 

L2 

-0.004457 

L3 

-0.005818 

L4 

-0.009299 

L5 

-0.025878 

Table  6 


For  both  calculations  the  Lyapunov  exponents  are  very 
small,  like  those  reported  in  [3],  and  positive  and  negat¬ 
ive  exponents  were  found  that  hallmark  a  chaotic  sys¬ 
tem. 

5.  CONCLUSION 

In  conclusion,  both  white  and  correlated  stochastic  time 
series  have  been  analysed  using  Schouten’s  correlation 
dimension  estimate  and  the  FNN  test.  In  both  cases, 
the  tests  resulted  in  the  misclassification  of  a  stochastic 
time  series  as  a  chaotic  process.  Low  fractal  D2  and 
DML  values  were  generated  from  Schouten’s  method 
which  are  signatures  of  chaos.  It  is  believed  that  this 
is  due  to  the  method  of  calculation  of  the  vector  length 
(m).  The  FNN  also  provide  low  dimensional  estimates 
which  can  be  misinterpreted  as  an  attractor.  Com¬ 
pound  k-distributed  surrogate  data  was  generated  and 
passed  through  the  same  nonlinear  analysis  as  was  per¬ 
formed  in  [3]  for  sea  clutter.  Smooth  Autocorrelation 
and  Mutual  information  curves  were  measured.  The 
Z>2  and  DML,  FNN  values  were  found  be  similar  to 


299 


those  reported  in  [3] ,  Finally  the  Lyapunov  exponents 
were  measured  and  positive  and  negative  exponents 
were  found  which  suggest  evidence  of  chaos.  Clearly, 
the  generated  K-distributed  data  is  not  chaotic  but  the 
nonlinear  tools  used  in  [3]  in  the  analysis  of  sea  clut¬ 
ter  suggest  otherwise.  This  counter  example  demon¬ 
strates  that  it  is  not  possible  to  distinguish  a  chaotic 
time  series  from  a  stochastic  one  using  these  invariants. 
Hence,  it  is  suggested  that  these  tools  be  used  for  de¬ 
terministic  time  series  only.  Thus,  we  conclude  that 
these  chaotic  invariants  are  redundant  in  an  applica¬ 
tion,  such  as  radar  sea  clutter,  where  the  time  series 
is  unknown  and  could  be  of  a  stochastic  nature.  This 
reopens  the  question  as  to  what  the  true  nature  of  sea 
clutter  actually  is. 
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ABSTRACT 

Blind  source  separation  has  many  important  appli¬ 
cations  in  communications  and  array  signal  processing. 
Many  widely  used  methods  require  prior  knowledge  on 
the  sign  of  the  kurtosis  of  the  sources  and  may  fail  if 
the  mixtures  contain  both  sub-  and  super-Gaussian  sig¬ 
nals.  In  this  paper  we  present  an  adaptive  algorithm  for 
separating  arbitrarily  kurtotic  sources.  The  blind  sep¬ 
aration  problem  is  modeled  using  a  state-space  formu¬ 
lation.  The  resulting  separation  algorithm  uses  a  sub¬ 
space  tracker  and  a  predictor-corrector  filter  structure 
related  to  the  well-known  Kalman  filter.  It  lends  itself 
easily  to  real-time  implementation.  The  zero-memory 
nonlinearities  needed  for  finding  independent  sources 
are  selected  online  by  monitoring  the  statistics  of  each 
estimated  source  signal.  Consequently,  separation  may 
be  achieved  even  if  a  change  in  the  sign  of  the  kurto¬ 
sis  occurs.  Simulation  examples  illustrating  the  ability 
to  adapt  to  time-varying  mixing  systems  and  source 
distributions  of  unknown  kurtosis  are  presented  using 
communications  and  biomedical  signals. 

1.  INTRODUCTION 

Blind  Source  Separation  (BSS)  has  important  appli¬ 
cations  in  biomedical  signal  analysis,  communications 
and  array  signal  processing.  Adaptive  separation  meth¬ 
ods  are  often  required  because  mixing  system  and  sig¬ 
nal  or  noise  statistics  may  be  time-varying.  Moreover, 
real-time  computation  is  desirable  in  many  key  appli¬ 
cation  areas. 

Typically  blind  separation  algorithms  based  on  higher 
order  statistics  assume  that  the  sign  of  the  kurtosis  is 
known  and  the  same  for  all  sources.  Consequently,  the 
zero-memory  nonlinearity  employed  is  fixed  in  advance. 
The  choice  of  nonlinearity  is  critical  for  achieving  the 
separation.  One  way  to  overcome  this  problem  is  to 
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approximate  the  nonlinear  function  by  a  linear  combi¬ 
nation  of  sigmoids  with  adjustable  slope  and  bias,  thus 
avoiding  the  estimation  of  the  kurtosis  [10].  A  large 
number  of  parameters  have  to  be  adapted  which  leads 
to  high  computational  complexity.  Moreover,  deriving 
recursive  algorithms  needed  in  real-time  operation  may 
be  tedious. 

In  this  paper,  we  propose  an  algorithm  for  online 
separation  of  source  signals  with  arbitrary  kurtosis. 
Separation  is  performed  by  employing  a  subspace  tracker 
and  a  recursive  estimator  with  a  predictor-corrector 
form.  Instead  of  making  restrictive  assumptions  on 
source  pdf’s  and  consequently  fixing  the  nonlinearity 
in  advance,  the  output  statistics  of  each  channel  is 
monitored  and  the  nonlinearity  is  selected  appropri¬ 
ately.  Consequently,  a  fully  adaptive  algorithm  for 
blind  separation  is  obtained  that  easily  lends  itself  to 
real-time  implementation.  The  performance  of  the  pro¬ 
posed  method  is  studied  in  simulations  where  sources 
with  different  kurtosis  parameters  are  used  and  the 
mixing  system  is  time-varying. 

This  paper  is  organized  as  follows.  The  BSS  prob¬ 
lem  is  presented  first.  Then  there  is  a  brief  description 
of  the  recursive  estimator  for  blind  separation  and  dis¬ 
cussion  of  selecting  appropriate  nonlinearities  on-line. 
In  section  4,  examples  on  separating  mixtures  of  both 
sub-  and  super-Gaussian  sources  are  given. 

2.  BLIND  SEPARATION 

Over  the  last  few  years  BSS  has  received  a  lot  of  at¬ 
tention  in  the  signal  processing,  communications  and 
neural  network  research  communities  (see  [2], [3]  and 
references  therein).  The  observed  noisy  mixtures  and 
the  unobserved  source  signals  are  related  by 

z(k)  =  A  s (k)  +  v(k)  (1) 

where  A  is  an  n  x  m  matrix  of  unknown  mixing  coeffi¬ 
cients,  n  >  m,  s  is  a  column  vector  of  m  source  signals, 
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z  is  a  column  vector  of  n  mixtures,  v  is  an  additive 
noise  vector  and  k  is  the  time  index.  The  mixing  is 
assumed  to  be  instantaneous  and  matrix  A  is  assumed 
to  be  of  full  rank.  Source  signals  are  typically  assumed 
zero  mean  and  stationary. 

The  separation  task  at  hand  is  to  estimate  a  sep¬ 
arating  matrix  W  or  mixing  matrix  H  so  that  the 
original  sources  are  recovered  from  the  noisy  mixtures. 
Prior  to  separation,  the  observed  signals  are  typically 
spatially  whitened  and  the  signal  powers  are  normal¬ 
ized  to  unity.  By  projecting  the  input  data  z  into  an 
m-dimensional  signal  subspace  yielding  y,  the  problem 
becomes  easier  to  solve  because  n  =  m  and  mxm  sepa¬ 
rating  matrix  will  be  orthogonal,  i.e.,  W  =  H~l  =  HT . 
Moreover,  some  noise  is  also  attenuated.  An  estimate 
x  of  unknown  sources  s  may  then  given  by 

s  =  x  =  Ht  y.  (2) 


where  y  is  the  whitened  data.  The  estimate  can  be 
obtained  only  up  to  a  permutation  and  scaling  of  s. 


3.  ADAPTIVE  SEPARATION  OF 
ARBITRARILY  KURTOTIC  SOURCES 

The  goal  of  the  adaptive  blind  algorithm  presented  in 
this  section  is  to  separate  sources  with  arbitrary  kurto- 
sis  parameters.  In  this  algorithm  a  zero-memory  non¬ 
linearity  is  used.  The  type  of  nonlinearity  is  selected 
adaptively  based  on  the  statistics  of  the  output.  The 
actual  adaptive  algorithm  consists  of  two  parts:  a  sig¬ 
nal  subspace  tracker  and  a  recursive  predictor-corrector 
filter  structure.  This  type  of  structure  allows  for  real¬ 
time  implementation. 


3.1.  Signal  Subspace  Tracking 

In  order  to  make  the  separation  problem  easier,  adap¬ 
tive  signal  subspace  tracking  is  employed.  The  n-dimen- 
sional  observations  z  (k)  are  projected  along  eigenvec¬ 
tors  corresponding  to  m  largest  eigenvalues.  Signal 
subspace  eigenvectors  U  and  eigenvalues  A  are  tracked 
on-line  using  the  adaptive  algorithm  introduced  in  [6]. 
Estimates  of  the  signal  subspace  eigenvectors  and  noise 
variance  are  updated  at  the  arrival  of  each  new  ob¬ 
servation  vector.  Thus,  at  each  step  k  we  obtain  a 
whitened  data  vector  y (k)  by  applying  the  transforma¬ 
tion  R(k )  =  A _1/2{/T  to  the  observation  vector  z (k). 
In  case  abrupt  change  in  the  mixture  covariance  struc¬ 
ture  occurs,  the  subspace  tracker  is  reinitialized  so  that 
recent  observations  are  trusted  more  [8]. 


3.2.  Separation  algorithm 

The  actual  separation  algorithm  can  be  considered  a 
modified  Kalman  filter  presented  in  a  predictor-corrector 
form.  This  is  achieved  by  descibing  the  blind  source 
separation  problem  using  a  state-space  model: 


x(fc)  =  F(k\k  -  l)x(fc  -  1)  +  G(k)w(k  -  1)  (3) 
y(fc)  =  H(k)x(k)  +  v(k)  (4) 

where  x  is  the  state  vector  to  be  estimated  and  y  is  the 
whitened  observation  vector.  The  noise  sequences  w 
and  v  are  Gaussian  white,  mutually  uncorrelated  with 
covariance  matrices  Q(k)  and  R(k).  The  measurement 
noise  variance  is  estimated  using  the  subspace  tracking 
algorithm.  Having  the  state-space  model,  the  predicted 
state  estimate  x(k\k  -  1)  is  given  by: 

x(k\k  -  1)  =  F(k\k  —  l)x(fc  -  l|fc  -  1)  (5) 

The  correction  equations  update  the  predicted  state 
estimate  to  a  filtered  state  estimate  based  on  the  new 
information  conveyed  by  the  measurements.  The  esti¬ 
mated  source  signals  are  given  by: 

x(k\k)  =  x(fc|fc  -  1)  +  A(fc)[y(fc)  -  Hx(k\k  -  1)]  (6) 

where  K(k)  is  the  Kalman  gain.  The  prediction  and 
correction  error  covariance  matrices  P(k\k  -  1)  and 
P(A;|fc)  are  updated  as  well  (see  [5]). 

In  the  BSS  problem,  matrices  H  and  F  are  not 
known  and  have  to  be  estimated  simultaneously.  Ma¬ 
trix  H  describes  the  mixing  system  whereas  matrix  F 
models  how  sources  evolve  over  time.  Structure  of  the 
F  matrix  depends  on  the  application.  For  example, 
in  some  cases  the  source  signals  may  exhibit  an  au¬ 
toregressive  structure.  In  this  case,  in  order  to  make 
the  prediction  more  accurate,  matrix  F  may  be  aug¬ 
mented  to  contain  a  low  order  AR  model.  This  also 
allows  for  noise  attenuation.  The  update  equations  for 
the  prediction  and  correction  error  covariance  matrices 
are  changed  [8]. 

In  estimating  the  mixing  matrix  H,  separate  ex¬ 
pressions  for  innovation  and  gain  for  correcting  the  ele¬ 
ments  of  H  are  required.  The  “innovation”  in  estimat¬ 
ing  H  is 


yH(k)  =  y(k)  -  H(k  -  l)xtf(fc)  (7) 

where  xH{k)  =  g(u(fc)),  u{k)  =  HTy(k),  g(u(fc))  = 
[gi{ui(k)) . . .  gm{um(k))]T ,  and  gi(-)  are  nonlinear  con¬ 
trast  functions.  In  the  subspace  tracking  stage,  the 
components  of  y  are  normalized  to  have  unit  variance. 
The  gain  Kh  used  in  estimating  the  mixing  matrix  is 


Kh  = 


P*h 

xJjPxh  +  1 ' 


(8) 
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Finally,  the  update  for  H  is  given  by: 

H(k)  =  H(k  -  1)  +  (y  -  H(k  -  l)xH)Kl  (9) 

The  adaptation  rate  for  H  should  be  relatively  slow 
compared  to  the  adaptation  rate  for  the  actual  state 
variables  to  have  better  stability.  At  each  time  step  k 
we  have  an  estimate  of  the  source  signals  and  of  the 
mixing  matrix. 

3.3.  Selecting  appropriate  nonlinearity 

Typically  prior  information  on  the  sign  of  the  kurtosis  is 
assumed  to  be  available  and  the  nonlinearities  <&(•)  are 
selected  accordingly.  This  assumption  is  often  unrea¬ 
sonable.  In  order  to  effect  a  truly  blind  algorithm,  the 
statistics  of  each  output  of  the  separation  system  are 
recursively  tracked  and  an  appropriate  nonlinearity  for 
each  channel  is  selected  from  two  alternatives  depend¬ 
ing  on  whether  the  source  is  deemed  to  have  negative 
or  positive  kurtosis.  The  selection  principle  employed 
here  was  introduced  in  [4]  and  stems  from  the  stability 
analysis  presented  in  [1].  In  order  to  choose  the  non¬ 
linear  functions  for  each  channel,  at  each  time  step  k 
we  recursively  estimate  the  following  statistics: 


functions  in  the  case  of  positive  kurtotic  sources  and 
hyperbolic  tangents  in  the  case  of  negative  kurtotic. 
In  this  paper  gi,i{u)  =  tanh(oiu)  and  gi,2(u)  =  «2«3 
are  employed,  where  a\  and  »2  are  constants.  Thus,  at 
each  time  k,  gi  component  from  g(u (fc))  is  either  <?;, i(-) 
or  gi, 2(-)  based  on  criterion  (14). 

4.  EXAMPLES 

In  this  section,  the  separation  performance  of  the  pro¬ 
posed  BSS  algorithm  is  studied  in  simulations.  In  or¬ 
der  to  demonstrate  the  practical  applicability  of  the 
algorithm  we  use  ECG  signals.  In  this  example,  2000 
samples  of  m=3  source  signals  and  n— 4  mixtures  are 
used.  The  initial  sources  are  two  positive  kurtotic  ECG 
signals  representing  maternal  and  fetal  heart  beats  at 
frequencies  slightly  above  1  Hz  and  slightly  below  3  Hz 
respectively  and  a  interfering  sinusoid  of  50  Hz  (nega¬ 
tive  kurtotic).  In  practice  this  problem  may  be  encoun¬ 
tered  if  the  electrocardiograph  is  disturbed  by  some 
unwanted  interference  due  to  poor  grounding  or  mus¬ 
cle  contraction  during  the  measurements.  The  mixing 
coefficient  matrix  A  is  randomly  generated  and  the  ob¬ 
served  mixtures  are  contaminated  with  zero  mean  ad¬ 
ditive  Gaussian  noise  with  a  —  0.1.  A  low  order  (p=2) 


of  (fc)  -  (!  “  P)°Kk  -  1)  +  lA%i  (k)\2  (10) 

«V(&)  =  (1- ^)Ki,r(k -1)  +  fj,g'r{xi(k)) 

Pi,r (k)  =  (1  -  p)pi,r (k-l)  +  pxi ( k)gr ( Xi {k)) 


where  1  <  i  <  m,  r  =  {1,2}  refers  to  the  type  of 
nonlinearity  and  p  is  a  positive  constant  such  that  0  < 
g  <  1. 

of  (fc),  Ki,r(fc)  and  pi,r(k)  can  be  defined  as: 


°i(k) 

=  Eitfm 

(11) 

K>itr  (^0 

=  E{g'r{xi{k))} 

(12) 

Pi,r{k) 

=  E{xi(k)gr(xi{k))} 

(13) 

where  Xi(k )  is  the  source  signal  estimate  on  channel  i 
at  time  k.  Let  us  denote  K,\  =  of  (fc)Kji(fc)  - pa(k)  and 
/C2  =  of{k)Ka{k)  -  Pi2{k).  The  nonlinear  function  for 
the  ith  component  at  time  k  is  selected  as  follows: 


9ik(x) 


9i(x), 

92(x), 


if  /Ci  -  K.2  <  0 
otherwise 


AR  model  is  employed  in  the  state  prediction  matrix 
F.  The  constants  used  in  the  contrast  functions  are 
ccj  =  1  and  Q-2  =  1/3.  The  type  of  nonlinearity  needed 
in  separation  is  selected  on-line  using  the  criterion  given 
in  (14).  At  each  time  step  k  we  update  the  statistics 
per  (10),  with  g  =  0.01.  The  results  of  the  separation 
are  presented  in  Fig.  1.  In  order  to  qualitatively  illus¬ 
trate  the  recovery  of  the  shape  of  the  ECG  signals  with 
high  fidelity,  original  noise  free  sources  and  separated 
sources  are  plotted  in  Fig.  2. 

The  selection  of  the  appropriate  zero-memory  non¬ 
linearity  is  simulated  next.  The  track  of  the  sign  of  K\  — 
K.2  for  each  channel  is  presented  in  Fig.  3.  If  K,\  -  K-2 
is  positive  it  means  that  on  the  respective  channel  we 
have  a  super-Gaussian  signal,  otherwise  we  have  a  sub- 
Gaussian  signal.  Typically  number  of  samples  needed 
to  achieve  separation  is  300. 

A  non-stationary  scenario  may  be  simulated  as  fol¬ 
lows:  given  the  stationary  input  source  signals  we  per¬ 
form  the  mixing  by  using  a  slowly  time- varying  mixing 
matrix.  This  can  be  obtained  by  applying  a  rotation 
matrix  T(6{k))  to  the  random  mixing  matrix  A,  where 


The  functions  gi  and  g2  are  the  corresponding  non- 
linearities  if  the  sources  are  sub-  or  super-Gaussian. 
There  are  many  different  contrast  functions  that  may 
be  used  in  order  to  perform  the  separation  (see  [3]).  It 
has  been  proven  [7]  that  for  the  nonlinear  PCA  class  of 
algorithms  suitable  nonlinearities  are  odd  polynomial 


T(6(k))  is  given  by: 

cos(6(k))  sin(6(k ))  0  0 

—sin(9(k))  cos(9(k))  0  0  ^  r. 

0  0  1  0  1  ; 

0  0  0  1 


T(0(k))  = 
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Figure  1:  An  example  of  blind  separation  from  noisy 
mixtures,  (a)  Noise  free  source  signals,  (b)  noisy  mix¬ 
tures  with  additive  Gaussian  noise  with  zero  mean  and 
a  =  0.1,  (c)  separation  results  obtained  using  on-line 
eigenpair  tracking  and  predictor-corrector  structure  by 
adaptively  selecting  the  appropriately  nonlinearity  for 
each  channel. 

The  subspace  tracking  algorithm  requires  that  the  num¬ 
ber  of  sensors  is  greater  than  the  number  of  sources 
(n  >  m).  In  this  example  the  same  ECG  and  sinu¬ 
soidal  m= 3  source  signals  of  2000  samples  and  n— 4 
mixtures  are  used.  To  provide  an  initial  estimate  of 
the  mixing  matrix  and  predictor-corrector  parameters, 
static  random  mixing  matrix  A  is  used  for  the  first  1000 
samples.  For  the  next  1000  samples  the  angle  0(fc)  is 
linearly  changed  from  the  initial  value  0(1000)  =  0  to 
the  final  value  0(2000)  =  n/3.  The  separation  result  is 
presented  in  Fig.  4. 

Good  estimates  of  the  source  signals  are  possible 
due  to  the  ability  of  the  subspace  algorithm  to  track 
the  eigenvalues  and  eigenvectors  in  non-stationary  en¬ 
vironment.  The  convergence  of  the  tracking  method 


Figure  2:  Recovering  the  original  source  with  high  fi¬ 
delity:  Points  1430  -  1780  from  the  original  signals 
(continuous  line)  and  the  recovered  sources  (dash-dot 
line) 


Figure  3:  Trace  of  the  decision  criterion  K\  -  K, 2  for 
each  channel.  The  positive  value  stands  for  super- 
Gaussian,  the  negative  for  sub-Gaussian. 


Figure  4:  BSS  for  a  slowly  time-varying  mixing  struc¬ 
ture. 


is  illustrated  by  plotting  the  canonical  angles  between 
the  basis  vectors  in  the  estimated  and  theoretical  sig¬ 
nal  subspaces.  We  consider  u  and  u  being  the  eigen¬ 
vectors  of  the  estimated  and  true  signal  subspaces  and 
we  compute  the  singular  value  decomposition  of  uTu. 
Let  7i  >  72  >  •  •  •  >  7m  be  the  singular  values  of 
uTu.  The  canonical  angles  between  the  basis  vectors 
are  obtained  by  Z(u,u)  =  cos~1ji.  If  the  maximum 
canonical  angle  is  small  the  subspaces  are  close  to  each 
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other.  The  maximum  canonical  angle  for  the  mixtures 
used  in  Fig.  1  is  shown  in  Fig.  5.  The  results  indi¬ 
cates  that  the  subspace  tracker  converge  relatively  fast 
to  true  signal  subspace. 


Figure  5:  Maximum  canonical  angle  between  estimated 
and  theoretical  signal  subspace  basis  vectors. 

The  performance  of  the  proposed  method  is  also  in¬ 
vestigated  using  communications  signals.  4-QAM,  16- 
QAM  (sub-Gaussian)  and  Laplacian  distributed  jam¬ 
ming  signals  (super-Gaussian)  are  randomly  mixed  and 
contaminated  with  Gaussian  noise  (SNR  =  32dB).  The 
Laplacian  p.d.f.  is  given  by  p(s )  =  0.5e~lsL  The  total 
number  of  samples  is  2000  and  the  number  of  receivers 
is  four  and  is  not  changed  during  the  simulation.  In 
this  case  the  state  prediction  matrix  is  F  —  I.  From 
the  signal  space  diagram  of  the  mixed  signals  no  QAM 
constellation  can  be  distinguished.  The  result  of  the 
separation  is  presented  in  Fig.  6. 


Figure  6:  Separation  result  in  the  case  of  communica¬ 
tions  signals.  Only  the  last  300  samples  are  shown. 


5.  CONCLUSION 

In  many  BSS  problems  we  do  not  have  prior  informa¬ 
tion  on  the  type  of  pdf  and  the  sign  of  the  kurtosis. 
Furthermore,  signal  statistics  may  be  time  varying.  We 
introduced  an  algorithm  that  does  not  make  restric¬ 
tive  assumptions  on  the  form  of  the  pdf,  can  adapt  to 
changes  in  the  mixing  system  and  signal  statistics,  and 


lends  itself  to  real-time  computation.  The  algorithm 
performs  signal  subspace  tracking  and  employs  a  re¬ 
cursive  estimator  to  produce  estimates  of  the  source 
signals  at  the  arrival  of  each  new  mixture  observation. 
A  zero  memory  nonlinearity  is  employed  in  separa¬ 
tion.  The  type  of  nonlinearity  is  adaptively  determined 
based  on  the  statistics  computed  from  the  output. 
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ABSTRACT 

In  this  paper,  we  present  an  efficient  solution  to  the 
blind  multi-channel  deconvolution  problem  that  con¬ 
sists  of  recovering  independent  source  signals  from  their 
convolutive  mixtures.  In  the  case  of  instantaneous  mix¬ 
tures,  a  robust  solution  referred  to  as  Second  Order 
Blind  Identification  (SOBI)  has  been  proposed  previ¬ 
ously.  It  is  based  on  the  joint  diagonalization  of  spatio- 
temporal  correlation  matrices.  Herein,  we  extend  this 
technique  to  the  convolutive  mixture  case.  In  con¬ 
trast  to  existing  deconvolution  techniques,  this  new 
approach  is  able  to  deal  with  an  overestimated  source 
number.  The  proposed  method  has  been  successfully 
applied  to  the  deconvolution  of  speech  signals. 

1.  INTRODUCTION 

If  we  consider  a  set  of  received  signals  that  are  linear 
convolutive  mixtures  of  decorrelated  source  signals,  the 
objective  of  blind  deconvolution  is  to  recover  the  source 
signals  from  the  set  of  received  signals  without  any 
knowledge  of  the  linear  mixtures  or  the  Linear  Time 
Invariant  (LTI)  systems.  For  instantaneous  mixtures, 
a  Second  Order  Blind  Identification  (SOBI)  algorithm 
has  been  presented  [I]  and  showed  to  be  very  robust  for 
temporally  correlated  sources.  There  are  two  ways  to 
achieve  blind  deconvolution.  One  way  is  to  first  identify 
the  channel  system  from  the  output  mixtures  and  then 
to  design  an  equalizer  accordingly  [2].  The  other  way 
consists  of  directly  designing  an  equalizer  from  the  out¬ 
put  mixtures.  This  approach  bypasses  the  problem  of 
blind  system  identification  and  is  less  costly  in  compu¬ 
tation.  Using  the  second  approach,  we  extend  the  SOBI 
technique  to  the  convolutive  mixture  case.  It  is  based 
on  the  joint  diagonalization  of  spatio-temporal  corre- 

The  Authors  would  like  to  thank  STEP  Alger,  distributor  of 
Motorola  in  Algeria,  for  its  support  for  the  presentation  of  this 
work. 


lation  matrices.  The  proposed  method  has  been  suc¬ 
cessfully  applied  to  the  deconvolution  of  speech  signals 
and  showed  to  be  robust  with  respect  to  additive  noise. 
Furthermore,  this  new  approach  is  able  to  deal  with  an 
overestimated  source  number.  In  the  next  section,  we 
will  present  the  data  model,  the  different  hypothesis 
and  the  identifiability  conditions.  The  proposed  algo¬ 
rithm  will  be  described  in  section  3.  And  finally,  some 
simulation  results  are  provided  in  section  4. 

2.  PROBLEM  FORMULATION 
2.1.  Data  Model 

Consider  a  discrete  time  multiple  input  multiple  output 
(MIMO)  linear  time  invariant  model  given  by, 

M  l- l 

xi(n )  =  Y11L,  hij(l)sj{n~l)  +  Mn),  for  i  =  1,-  --,N 

j= 1  1=0 

(!) 

where  Sj(n),  j  =  1  ,  are  the  M  source  signals 

(model  inputs  ),  aq(n),  i  =  1,  •  •  • ,  N,  are  the  N  sensor 
signals  (model  outputs)  with  N  >  M,  hij  is  the  transfer 
function  between  the  j-th  source  and  the  i-th  sensor 
with  an  overall  extend  L,  and  n,-(n),  i  =  1,  •  •  • ,  N,  are 
additive  white  noises. 

The  assumptions  made  about  the  data  model  are 
as  follows: 

Al)  The  source  signals  sj(n),  j  =  1,  •  •  • ,  M,  are  mutu¬ 
ally  decorrelated  and  each  source  signal  is  temporally 
coherent. 

A2)  The  noise  processes  n,(n),  i—  1,  •  •  • ,  N,  are  zero- 
mean  stationary  processes  independent  of  the  source 
signals. 

The  purpose  of  blind  multi  channel  deconvolution 
is  to  recover  the  source  signals  based  only  on  the  sensor 
signals.  This  leads  to  find  a  set  of  weights  {u>j,(/)}  such 
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that, 

N  L-l 

Sj{n)  =  ^2^Twji(l)xi(n-l),  for  j  =  1 ,  •  •  • , M  (2) 
»= 1  f=o 

where  Sj(n)  are  the  recovered  source  signals. 

We  can  rewrite  equation  (2)  in  the  following  matrix 


form, 

s(n)  =  Wx(ti) 

(3) 

where 

s(n) 

=  [*!(")>•• 

'.«M(n)]T 

*(") 

=  tXl(«)>' 

••,xi (n  -  L+  1),-- 

•  ,xn(h  —  L  +  1)]T 

wu(0) 

•  •  wn(L  —  1)  • 

win(L  —  1) 

W2l(0)  • 

W2l(L-l) 

W2N(L  —  1) 

W  = 

;  ; 

.  UlM  l(0) 

••  u>mi(L—  1)  • 

•  •  wmn{L  —  1)  . 

(4) 

2.2.  Identiflability 

Using  the  z-transform,  the  design  of  an  equalizer  W(z) 
that  recovers  the  original  source  signals  only  from  the 
observations  x(n)  can  be  formulated  as  follows: 

i(n)=W(z)[H(z)s(n)+n(n)]  (5) 

s(n)  =  W(z)H(z)s(n)  +  W(z)n(n)  (6) 

Let  us  write, 

G(z)  =  W(z)H(z)  (7) 

As  shown  in  [3],  the  LTI  system  represented  by  its 
transfer  function  matrix  G(z)  is  said  to  be  transparent 
or  decoupled  if  G(z)  has  a  single  nonzero  monomial 
entry  in  each  row  and  each  column. 

In  other  words,  an  LTI  system  is  transparent  if  and 
only  if  G(z)  can  be  decomposed  into: 

G(z)  =  A(z)DP  (8) 

where  A(z)  is  a  diagonal  matrix  with  diagonal  entries: 

Xu  -  zli  (9) 

where  U  is  a  non-negative  integer,  D  is  a  constant  di¬ 
agonal  matrix,  and  P  a  permutation  matrix. 

Then,  a  channel  system  H(z)  is  said  to  be  deconvolvable 
if  there  exists  an  equalizer  W(z)  so  that  the  composite 
system  G(z)  is  transparent. 

Furthermore,  a  necessary  and  sufficient  condition 
for  H(z)  to  be  deconvolvable  is  that  the  greatest  com¬ 
mon  divisor  of  all  the  minors  of  order  M  in  H(z)  is 
nonzero  monomial  (see  [4]  for  details). 


3.  THE  PROPOSED  ALGORITHM 
(SOMOD) 

The  problem  of  blind  multi  channel  deconvolution  is  to 
find  W  an  [M  x  NL]  matrix  such  that  s(n)  =  s (n). 
We  can  define  the  source  correlation  matrices  at  time 


lag  k  as: 

Rt{k)  =  E[s(n)s(n  -  &)*] 

(10) 

Where  *  denotes  the  transpose  conjugate  of 
Under  relation  (3),  the  above  equation  can 
the  following  form: 

a  vector, 
be  put  in 

Rs(k)  =  WR,(I)Ww 

(11) 

where, 

Rx(k)  =  E[x(n)x(n  -  Ar)*] 

(12) 

are  the  data  correlation  matrices  at  time  lag  k. 

Let  us  consider  the  following  decomposition  of  W, 

W  =  UffB 

(13) 

where  U  is  an  [M  x  M]  unitary  matrix,  H  denotes  the 
transpose  conjugate  of  a  matrix  and  B  is  an  [M  x  NL] 
matrix. 

Substituting  (13)  into  (11)  and  assuming,  without  loss 
of  the  generality,  that  the  source  signals  are  of  unit 
variance1,  one  can  write 

R„(0)  =  BRi(0)Bff  =  I  (14) 

According  to  equation  (14),  B  is  nothing  than  a  whiten¬ 
ing  matrix  that  can  be  obtained  from  an  eigen  decom¬ 
position  Of  Rj;(0). 

For  time  lag  k,  k  ^  0,  we  have 

R,(k)  =  UHBRx(k)BHU  =  Ak  (15) 

where  A*  is  a  diagonal  matrix  according  to  assumption 
Al).  By  denoting 

Rxik)  =  BRx{k)BH  (16) 

where  {Rx{k),  k  -  1,  •  •  ■ , K}  is  a  set  of  K  whitened 
data  correlation  matrices  at  different  time  lags,  we  ob¬ 
tain  the  following  key  relation 

Ak  =  VHRx(k)V  (17) 

Since  the  matrix  U  is  unitary  and  Ak  is  diagonal,  ex¬ 
pression  (17)  shows  that  any  whitened  data  correlation 
matrix  is  diagonal  in  the  basis  of  the  columns  of  the 
matrix  U  (the  eigenvalues  oi  FL^k)  being  the  diagonal 
entries  of  Ak). 

'Because  of  the  well  known  ambiguity  of  blind  identification. 
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If,  for  the  time  lag  k,  the  diagonal  elements  of  Ak 
are  all  distinct,  the  missing  unitary  matrix  U  may  be 
‘uniquely’  (i.e.  up  to  permutation  and  phase  shifts)  re¬ 
trieved  by  computing  the  eigen  decomposition  ofRa(k). 

Indeterminacy  occurs  in  the  case  of  degenerate  eigen¬ 
values.  It  does  not  seem  possible  to  a  priori  determine 
some  value  for  the  delay  k  such  that  the  diagonal  entries 
of  Ak  are  all  distinct.  Of  course,  if  the  source  signals 
have  different  spectral  shapes,  such  a  kind  of  eigenvalue 
degeneracy  is  unlikely,  but  it  is  to  be  expected  that 
when  some  eigenvalues  of  Rj.(k)  comes  close  to  degen¬ 
eracy,  the  robustness  of  determining  U  from  eigen  de¬ 
composition  of  a  single  whitened  data  correlation  ma¬ 
trix  is  seriously  impaired. 

The  situation  is  more  favorable  when  considering 
simultaneous  diagonalization  of  a  set  {Rx(k)}  of  I< 
whitened  data  correlation  matrices.  This  set  is  simul¬ 
taneously  diagonalizable  by  the  unitary  matrix  U  as  in 
(17). 

The  matrix  U  is  unique  (to  a  permutation  ma¬ 
trix  and  phase  factors)  if,  and  only  if,  for  any  pair 
(i,j)  of  sources,  there  exists  a  time  lag  k  such  that 
E[si(n)si(n  -  A:)*]  ^  E[sj(n)sj(n  -  &)*]).  Of  course, 
the  simultaneous  diagonalization  holds  only  for  the  ex¬ 
act  statistics;  empirical  statistics  may  only  be  approx- 
imatively  simultaneously  diagonalized  under  the  same 
unitary  transform.  This  calls  for  the  definition  of  the 
approximate  simultaneous  diagonalization. 

Joint  diagonalization:  The  joint  diagonalization  (JD) 
[1]  can  be  explained  by  first  noting  that  the  problem  of 
the  diagonalization  of  a  single  n  x  n  normal  matrix  M 
is  equivalent  to  the  maximization  of  the  criterion  [8] 

C(M,  V)  =f  ^2  |v*Mv,-|2  (18) 

i 

over  the  set  of  unitary  matrices  V  =  [vi,  •  •  • ,  v„].  Hence, 
the  joint  diagonalization  of  a  set  {Mfc|&  =  1..A'}  of  K 
arbitrary  n  xn  matrices  is  defined  as  the  maximization 
of  the  following  JD  criterion: 

C(V)  Hf  £C(M*,  V)  =  £  |v*M*v,|2  (19) 

k  k,i 

under  the  same  unitary  constraint.  An  efficient  joint 
approximate  diagonalization  algorithm  exists  in  [1]  and 
it  is  a  generalization  of  the  Jacobi  technique  [8]  for  the 
exact  diagonalization  of  a  single  normal  matrix. 


Finally,  the  unitary  matrix  U  in  (13)  is  obtained 
by  the  joint  diagonalization  of  the  set  {/^(Ar)}  which 


corresponds  to  the  maximization: 

L—  1  M 

U  =  Argmax  EEiwki3  (2°) 
U  k= 0 i=l 

with  U  =  [uu-.-.ujtf]. 

Implementation  issue 

The  eigen  decomposition  of  Rr(0)  for  the  determina¬ 
tion  of  matrix  B  will  provide  us  with  M  X  L  eigen  vec¬ 
tors  that  span  the  extended  subspace.  Only  M  of  them 
span  the  original  source  subspace.  Hence,  it  is  impos¬ 
sible,  without  any  knowledge  of  the  original  sources  to 
select  the  M  eigen  vectors  among  the  M  x  L  obtained. 
We  propose  to  use  all  the  M  x  L  vectors  in  order  to  de¬ 
termine  B  which  will  change  the  dimension  of  B  from 
[M  x  NL]  to  [ML  x  NL\.  Then,  we  maximize  the 
JD  criterion  using  a  unitary  matrix  U'  of  dimension 
ML  x  ML: 

L- 1  ML 

U'  =  Argmax  ED  u'J&MuM2  (21) 

u'  k  = 0 1=1 

with  U'  =  [u'x ,  •  •  • ,  ■ 

One  can  easily  show  that  the  maximization  (21)  leads 
to  the  maximization  (20)  and  among  the  M  x  L  eigen 
vectors  of  U',  M  of  them  correspond  to  the  desired 
sources.  The  desired  M  eigen  vectors  are  selected  from 
the  M  x  L  ones  by  choosing  those  which  lead  to  the 
smallest  correlation  coefficients  of  the  recovered  signals. 
The  proposed  approach  has  shown  to  be  robust  with 
respect  to  an  overestimated  source  number. 

4.  SIMULATIONS 

Example  1 

we  consider  an  array  of  2  sensors  receiving  signals  from 
2  sources  in  the  presence  of  white  Gaussian  noise.  The 
channel  length  is  L  =  4.  The  signal  to  noise  ratio 
(SNR)  is  set  at  40  dB.  Figure  1  shows  the  temporal 
representation  of  the  original  sources,  their  convolutive 
mixtures  and  the  recovered  signals  by  the  SOMOD  al¬ 
gorithm.  For  the  same  experiment,  Figure  2  shows  the 
Time  Frequency  representation  of  the  original  sources, 
their  convolutive  mixtures  and  the  recovered  signals  by 
the  SOMOD  algorithm.  The  kernel  used  for  the  com¬ 
putation  of  the  TFDs  is  the  Choi-Williams  kernel  [9]. 
This  example  is  an  illustration  of  the  success  of  the 
proposed  algorithm  in  separating  two  sources. 
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Figure  1:  Separation  example  (Time  representation): 
SNR=40dB. 


TFD  ol  the  original  source  1 
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Figure  2:  Separation  example  (Time  Frequency  repre¬ 
sentation):  SNR=40dB. 


Example  2 

We  present  here  a  simulation  to  illustrate  the  effective¬ 
ness  of  our  algorithm  in  deconvolving  speech  signals. 
The  parameter  settings  are  : 

•  M  =  2,  N  =  2  and  L  =  3. 

•  The  two  speech  signals  are  sampled  at  16kldz. 

•  Signal  to  Noise  Ratio  (SNR)  =  10  dB. 

•  The  transfer  function  matrix  of  the  simulated  multi 
channel  is  given  by, 


H(*)  = 

-0.40  +  0.82z_1  +  1.29z-2 
0.69  +  0.71z-1  +0.67  z-2 


1.19  -  0.02z-1  -  1.602“' 
-1.20-0.162-1  +0.26  z“ 


Figure  2  shows  the  original  speech  signals,  their  con- 
volutive  mixtures  and  the  recovered  speech  signals  by 
the  SOMOD  algorithm. 
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Figure  3:  Speech  signal  separation  :  SNR=10dB. 


5.  CONCLUSION 

In  this  contribution,  we  considered  the  blind  deconvo¬ 
lution  of  MIMO  FIR  systems  driven  by  mutually  decor- 
related  source  signals.  We  proposed  a  solution  based  on 
the  joint  diagonalization  of  spatio-temporal  correlation 
matrices.  This  technique  has  been  proposed  previously 
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in  the  case  of  instantaneous  mixtures  [1],  An  exten¬ 
sion  of  this  method  to  convolutive  mixtures  has  been 
presented  in  this  paper.  It  showed  to  be  robust  with  re¬ 
spect  to  additive  noise.  Moreover,  it  is  able  to  deal  with 
an  overestimated  source  number;  since  the  method  pro¬ 
vides  an  MxL  recovered  source  subspace  instead  of  the 
original  M  source  subspace.  A  source  selection  crite¬ 
rion  has  been  defined  to  select  the  M  recovered  sources 
among  the  MxL  obtained.  This  method  is  well  suited 
when  applied  to  the  deconvolution  of  speech  signals, 
which  is  of  great  importance  in  practical  applications 

[7]- 
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1.  ABSTRACT 

A  novel  direction  of  arrival  (DOA)  technique  is  presented 
which  constructs  estimates  of  the  relative  delay  mixing  pa¬ 
rameters  associated  with  each  signal  by  taking  the  ratio  of 
time-frequency  representations  of  two  mixtures.  The  tech¬ 
nique  is  based  on  the  Degenerate  Unmixing  and  Estimation 
Technique  (DUET)[1].  If  the  sources  are  W-disjoint  orthog¬ 
onal,  meaning  that  only  one  signal  is  active  in  the  time- 
frequency  plane  at  a  given  time-frequency,  then  the  ratio 
only  depends  on  the  mixing  parameters  of  one  source.  The 
ratio  can  thus  be  used  to  generate  estimates  of  the  mixing 
parameters  and  these  estimates  can  be  clustered  to  deter¬ 
mine  both  the  number  of  sources  present  in  the  mixtures 
and  their  associated  mixing  parameters.  The  method  al¬ 
lows  for  the  estimation  of  the  DOA  for  many  sources  using 
only  two  receive  antennas,  whereas  traditional  techniques 
require  N  antennas  to  estimate  N  —  1  angles  of  arrival. 
Simulation  results  are  presented  and  compared  to  MUSIC, 
ESPRIT,  and  other  DOA  estimation  techniques. 

2.  INTRODUCTION 

The  goal  of  accurately  estimating  the  arrival  angle  of  a  sig¬ 
nal  on  an  antenna  array  is  long  standing  in  the  field  of  signal 
processing.  Direction  of  arrived  estimation  is  important  for 
such  tasks  as  tracking  the  signal  emitter  and  smart  antenna 
array  processing  for  interference  reduction  in  mobile  wire¬ 
less  systems. 

Most  DOA  techniques  require  N  antennas  to  estimate 
N  —  1  angles  of  arrival.  A  notable  exception  to  the  N  —  1 
angles  of  arrival  rule  uses  forth-order  cumulants  to  estimate 
three  time  delays  from  two  mixtures[2].  One  advantage  of 
the  technique  presented  here  is  that  it  requires  only  two  an¬ 
tenna  elements  to  estimate  the  arrival  angle  of  an  arbitrary 
number  of  sources.  This  reduction  in  the  required  number 
of  antenna  elements  is  made  by  assuming  the  sources  are 
W-disjoint  orthogonal. 

This  paper  applies  the  work  on  the  Degenerate  Unmix¬ 
ing  and  Estimation  Technique(DUET)  on  W-disjoint  or¬ 
thogonal  signals  originally  proposed  in  [3]  to  wireless  sig¬ 
nals.  W-disjoint  orthogonal  signals  have  disjoint  support  for 
their  time-frequency  representation.  For  example,  multiple 
M-ary  frequency  shift  keyed  signals  are  W-disjoint  orthogo¬ 
nal,  except  for  the  occasional  hit  when  two  or  more  signals 
transmit  at  the  same  frequency  at  the  same  time.  Another 


(perhaps  surprising)  example  of  W-disjoint  orthogonal  sig¬ 
nals  is  speech.  Tests  show  that  voice  data  satisfies  the  W- 
disjoint  orthogonality  constraint  closely  enough  to  allow  ac¬ 
curate  angle  of  arrival  estimation  and  blind  separation[3,  4]. 

In  essence,  the  W-disjoint  orthogonal  assumption  as¬ 
sumes  that  all  signals  are  instantaneously  separated  in  the 
frequency  domain.  Thus  the  technique  presented  herein 
would  not  work,  for  example,  when  the  signals  are  sinusoids 
modulated  at  exactly  the  same  frequency  as  the  signals  in 
that  case  would  not  be  W-disjoint  orthogonal. 

Note  that  one  could  employ  a  bank  of  narrow  band¬ 
pass  filters  to  create  a  number  of  narrowband  signal  chan¬ 
nels  and  use  the  DOA  estimation  schemes  described  in  the 
above  overview  literature.  In  this  case,  with  N  =  2,  the 
standard  DOA  estimation  technique  would  be  able  to  es¬ 
timate  the  angle  of  arrival  of  one  source  per  channel.  If 
the  source  to  bandpass  channel  mapping  changes,  as  would 
happen  rapidly  with  frequency  hopped  or  voice  signals,  the 
multitude  of  estimates  from  different  channels  for  different 
times  must  be  combined  in  some  fashion[5,  6],  One  advan¬ 
tage  of  this  technique  is  that  the  estimates  from  different 
channels  are  combined  inherently  as  part  of  the  clustering. 

Section  3  describes  the  mixture  model,  defines  W-disjoint 
orthogonality,  and  proposes  a  number  of  possible  angle  of 
arrival  estimators  based  on  the  model  and  assumptions. 
Section  4  presents  results  of  the  DOA  estimator  perfor¬ 
mance,  comparing  results  with  ESPRIT,  MUSIC,  and  other 
standard  DOA  techniques. 


3.  MIXING  PARAMETER  ESTIMATION 


3.1.  Signal  Mixing 

Consider  the  measurements  of  a  pair  of  antenna  elements 
where  only  the  direct  path  is  present.  In  this  case,  each 
mixture  Xi(t)  is  the  sum  of  delayed  attenuated  sources  sig¬ 
nals.  We  can  absorb  the  attenuation  factor  and  time  delay 
associated  with  each  source  to  the  first  antenna  element 
into  the  definition  of  the  sources  and  represent  mixing  in 
the  frequency  domain  as, 


'  XiH  ' 

_  X2(u>)  . 

Si(ui) 

i  1 

aNe~'“s»  J 
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where  <5,  is  the  arrival  delay  between  adjacent  array  ele¬ 
ments  resulting  from  the  angle  of  arrival  for  source  i  and  a, 
is  the  relative  attenuation  factor  for  each  source  between  ar¬ 
ray  elements.  We  denote  the  maximum  possible  time  delay 
between  array  elements  as  A  and  thus  |5,|  <  A,  Vi. 

3.2.  Source  Assumptions 

Given  a  windowing  function  W(t),  we  call  two  functions 
s;(t)  and  Sj(t)  W-disjoint  orthogonal  if  the  supports  of 
the  windowed  Fourier  transforms  of  s,-(t)  and  s}(t)  are  dis¬ 
joint.  The  windowed  Fourier  transform  of  Si(t)  is  defined, 

/OO 

iy(t-r)*(t)e— 'dt,  (2) 

■  OO 

which  we  will  refer  to  as  S-v (w,  r)  when  appropriate.  The 
W-disjoint  orthogonality  assumption  can  be  stated  concisely, 

s T (w,  r)5jv(w,  r)  =  0,  Vi  ^  j,  Vw,  r.  (3) 

We  will  assume,  as  is  common  in  array  processing  litera¬ 
ture,  the  physical  separation  of  the  sensors  is  small  enough 
relative  to  the  carrier  and  bandwidth  of  the  signal  such  that 
the  relative  delay  between  the  sensors  can  be  expressed  as 
a  phase  shift  of  the  signal[7].  This  assumption  is  known  as 
the  narrowband  assumption  in  array  processing  and  can 
be  expressed  for  our  purposes  as, 

?W(s,(-  -  S))(w,  r)  =  e-" •(.))(*,  r),  V|<5|  <  A. 

(4) 


3.3.  Amplitude-Delay  Estimation 

For  W-disjoint  orthogonal  sources  under  the  narrowband 
assumption,  we  note  that  mixing  can  be  expressed  in  the 
time-frequency  domain  as, 

[  X%\  r)  ]  =  [  a,e~'uSi  ]  5,(w’  r)’  for  some  *•  (5) 

Due  to  the  sources  being  W-disjoint  orthogonal,  mixing  for 
a  given  (w,  r)  is  a  function  of  at  most  one  source.  Thus,  the 
mixing  parameters  can  be  approximated  for  a  given  (w,  r) 
using, 


r) 


S>(log( 


(6) 


for  some  i,  where  9  denotes  taking  the  imaginary  part. 
Equation  6  has  been  shown  to  yield  accurate  mixing  param¬ 
eter  estimates  for  appropriate  W(t)  under  a  variety  of  noise 
(independent  additive  white  Gaussian  noise)  and  multipath 
conditions[3].  Note  that  for  baseband  representations  of 
wireless  signals,  we  must  divide  by  w  -  wc  instead  of  w  in 
Equation  6  where  wc  is  the  carrier  frequency. 

Using  Equation  6,  every  (u>,  t)  yields  an  estimate  pair 
for  the  relative  amplitude-delay  parameter  associated  with 
one  source.  For  W-disjoint  orthogonal  signals,  if  we  were  to 
calculate  amplitude-delay  estimates  from  a  number  of  time- 
frequency  points,  we  would  expect  to  see  clusters  around 
the  true  delay  mixing  parameters  for  each  source.  Figure  1 
shows  the  estimate  clusters  for  a  ten  source  mixing  simu¬ 
lation.  If  we  were  to  use  a  standard  clustering  technique 


Figure  1:  Two-dimensional  histogram  of  number  of 
DUET  estimates  for  delay /amplitude  mixing  parameters 
for  ten  sources  obtained  using  two  mixtures.  The  sources 
were  M-ary  FSK  wireless  signals  asynchronously  arriv¬ 
ing  at  the  two  antenna  elements.  The  ten  peaks  corre¬ 
spond  to  the  correct  relative  amplitude  and  delay  mix¬ 
ing  parameters.  The  actual  relative  amplitudes  used 
in  the  mixing  were  (1,  1.2,  .8,  1,  1.1,  .9,  1,  1.2, 

l  li  -9)  and  the  corresponding  angle  of  arrivals  were 
(30°,  42°,  54°,  66°,  78°,  90°,  102°,  114°,  126°,  138°).  The 
units  of  the  relative  delay  is  (fractional)  samples. 


on  the  amplitude-delay  estimates,  the  number  of  clusters 
found  would  be  the  estimate  of  the  number  of  sources,  and 
the  cluster  centers  would  be  the  amplitude-delay  estimates 
associated  with  each  source.  The  estimated  delay  can,  of 
course,  be  translated  in  to  angle  of  arrival,  a,  via, 

a  —  arcsin(£,/A).  (7) 


3.3.1.  Equal  weight  combining 


Multiple  delay  estimates  can  be  combined  into  an  overall 
delay  estimate  in  order  to  mitigate  the  effects  of  noise  and 
the  inaccuracies  of  the  narrowband  assumption  via, 


■  -  m - - 

and  multiple  relative  amplitude  estimates  associated  with 
the  same  source  can  be  combined  into  an  overall  estimate, 


a, 


*T(u ’,0 
>(w,«)en i  ||  xW(u,,t) 

m 


(9) 


where  fi;  is  a  set  of  (w,  t)  points  (determined  to  be  associ¬ 
ated  with  the  j'th  cluster)  and  |fi,j  is  the  number  of  points 
in  the  set.  The  above  estimators,  Equations  8  and  9,  will 
be  referred  to  as  the  equal  weight  estimators  because  the 
estimate  from  each  time-frequency  point  being  considered 
gets  equal  weight  in  the  overall  estimate. 


3.3.2.  Power  weight  combining 

Rather  than  weighting  each  estimate  equally  in  the  cluster¬ 
ing  algorithm,  we  could  weight  each  estimate  by  the  instan¬ 
taneous  power  of  the  mixture  for  the  time-frequency  pair 
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generating  the  estimate.  Thus,  power  weighted  estimates 
are, 


E(w,T)e«.  ^)H2»(ios(fvS))/w 


<5,  = 


(10) 


a"  Z{„,r)eniW(»,TW  '  (  } 

3.3.3.  DOA  Product  Estimator 
An  alternative  delay  estimator  can  be  formed  noting, 

X?(u,  r)XfK  t)  =  a,e-S'“||5,lv  (w,  r)||2.  (12) 

Thus,  we  can  estimate  the  delay  parameter  via, 

6  =  S>(log(Xr(W,  r)X“>,  r)))/w.  (13) 

It  is  not  possible  to  estimate  the  relative  amplitude  param¬ 
eter  using  the  product. 


4.  SIMULATION  RESULTS 

A  realistic  scenario  was  defined  and  simulations  performed 
to  test  the  performance  of  the  DUET  algorithm.  Simula¬ 
tions  were  done  in  MATLAB[8]  and  detailed  comparisons 
of  the  DOA  results  made  to  ESPRIT  and  MUSIC[9,  10]. 
For  all  simulations,  the  two  subarrays  in  the  uniform  linear 
array  for  ESPRIT  are  displaced  by  one  antenna. 

For  the  simulations,  a  bit  stream  with  20  kbit/s  data 
rate  was  transmitted  using  M-ary  frequency  shift  keying 
(FSK)  with  a  carrier  frequency  of  1  GHz.  M-ary  FSK  trans¬ 
mits  information  via  shifting  the  carrier  frequency  of  a  mod¬ 
ulated  waveform  to  one  of  M  values  every  T,  seconds.  In 
M-ary  FSK,  the  signal  set  is  defined  as, 

s(t)  =  cos(wc  +  (*  —  l)Aw)t,  0  <  t  <  Ts,  i  =  1, 2, . . .  ,  M, 

(14) 

where  Atv  =  w/Ts.  The  M  orthogonal  signals  are  of  equal 
duration  and  power  and  are  separated  by  at  least  l/2Ts 
Hz.  Multiple  M-ary  signals  generated  from  independent 
bit  streams  are  nearly  W-disjoint  orthogonal,  provided  the 
probability  that  two  users  transmit  in  the  same  frequency 
bin  at  the  same  time  is  small.  The  M-ary  FSK  system  had 
60  frequency  bins  with  a  spacing  of  160  kHz.  Parameters 
and  signalling  method  were  chosen  to  model  narrowband 
signalling,  as  M-ary  FSK  is  a  narrowband  technique,  and 
also  serve  as  an  abstraction  for  frequency  hopped  spread 
spectrum  in  antenna  array  systems.  The  signal  sources  were 
asynchronous.  Therefore  each  signal  was  delayed  randomly 
simulating  asynchronously  arriving  bits.  The  sources  were 
delayed  by  choosing  random  angles  of  arrival  and  mixed 
synthetically  assuming  the  antennas  in  the  uniform  linear 
array  were  equally  spaced  with  half  wavelength  separation. 
For  simplicity,  all  amplitude  parameters  were  set  to  unity. 
All  simulations  were  done  in  the  complex  baseband  repre¬ 
sentation  of  the  received  signals. 

Figure  2  shows  the  histogram  of  estimates  for  one  ex¬ 
periment  with  one  source  at  —20°  for  the  power  weighted 


Figure  2:  Histogram  of  angle  estimates  with  the  DUET 
power  weighted  estimator  for  one  source  at  —20°  at  different 
noise  levels.  The  estimated  variance  increases  with  the  noise 
level. 


Figure  3:  Histogram  of  estimation  errors:  one  source,  no 
noise,  800  experiments  with  random  angle  of  arrival,  two  an¬ 
tennas,  comparison  of  DUET,  ESPRIT  and  MUSIC.  DUET 
has  better  performance  than  ESPRIT  and  MUSIC. 


estimator.  The  power  weighted  histogram  was  constructed 
by  summing  up  the  instantaneous  power  for  all  the  esti¬ 
mates  which  fall  in  a  given  bin.  As  expected,  for  lower 
signal  to  noise  ratios  the  peak  widens  and  DOA  estimates 
with  larger  variance  are  obtained. 

We  compared  the  accuracy  of  the  estimates  of  the  DUET 
algorithm  to  both  ESPRIT  and  MUSIC.  For  the  DUET  al¬ 
gorithm,  the  ratio  form  of  the  estimator  was  used.  The 
performance  of  the  product  form  of  the  estimator  was  sim¬ 
ilar  to  the  ratio  form.  In  the  simulations  with  no  noise,  the 
equally  weighted  estimator  was  used.  In  the  simulations 
with  noise,  the  power  weighted  estimator  was  used.  The 
performance  of  both  estimators  was  similar. 

In  the  first  set  of  comparison  simulations,  one  source 
was  randomly  placed  at  800  different  angles  and  the  DOA 
estimated  for  each  case.  This  is  a  fair  comparison  as  all 
three  algorithms  can  perform  the  estimation  with  just  two 
antennas.  The  histogram  of  absolute  errors  (Figure  3)  for 
the  no  noise  case  shows  that  DUET  outperforms  ESPRIT 
and  MUSIC. 

In  the  second  set  of  comparison  simulations,  10  sources 
were  used.  With  10  sources,  DUET  requires  only  two  anten¬ 
nas,  whereas  MUSIC  and  ESPRIT  require  at  least  11  anten¬ 
nas.  The  directions  of  arrival  for  the  sources  were  randomly 
chosen  between  30°  and  150°,  which  is  the  typical  range  of 
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Figure  4:  Histogram  of  estimation  errors:  ten  sources,  20 
dB  SNR,  300  experiments  (random  angle  of  arrival  for  each 
source),  two  antennas  for  DUET,  15  antennas  for  MUSIC 
and  ESPRIT.  Despite  using  only  two  antennas  (compared 
with  15  for  the  other  techniques),  DUET  estimates  the 
DOA  with  highest  accuracy. 


SNR 

DUET 

ESPRIT 

MUSIC 

ML 

MN 

oo  dB 

0.013 

1.628 

1.092 

0.050 

1.378 

20  dB 

0.093 

1.680 

1.088 

0.337 

1.408 

15  dB 

0.144 

1.681 

1.082 

0.852 

1.408 

10  dB 

1.484 

1.682 

1.087 

1.445 

2.407 

Table  1:  Maximum  absolute  error  in  degree  for  90%  of  esti¬ 
mates.  Two  antennas  were  used  for  DUET,  15  antennas  for 
other  algorithms.  For  example,  90%  of  DUET's  estimates  has 
an  error  of  less  than  or  equal  to  0.1444  degree  for  a  SNR  of 
15  dB.  The  table  shows  that  DUET  has  the  best  performance 
for  oo  dB,  20  dB,  and  15  dB  SNR.  At  10  dB  SNR,  DUET  has 
slightly  worse  performance  than  MUSIC  and  ML. 


angles  when  deploying  an  antenna  array  in  a  sectored  cel¬ 
lular  communication  system.  Simulations  were  performed 
with  10  sources  for  no  noise,  20  dB,  15  dB,  and  10  dB  signal 
to  noise  ratios.  Two  antennas  were  used  for  DUET,  15  for 
ESPRIT  and  MUSIC.  The  histograms  of  absolute  errors  are 
shown  in  Figure  4  for  the  20  dB  case  and  the  tables  con¬ 
tain  results  for  all  the  noise  levels  and  also  contain  results 
from  the  well-known  ML  and  Min-Norm(MN)  methods[8]. 
The  results  show  that  DUET  has  better  performance  than 
ESPRIT  and  MUSIC  for  oo  dB,  20  dB,  and  15  dB  SNR 
cases.  However,  the  performance  of  ESPRIT  and  MUSIC 
is  relatively  invariant  to  noise  while  DUET’s  performance 
decreases  with  higher  noise  levels. 


5.  SUMMARY 

A  method  for  DOA  estimation  for  an  arbitrary  number  of 
W-disjoint  orthogonal  sources  from  two  receive  antennas 
has  been  presented.  Simulations  confirm  that  the  technique, 
DUET,  can  be  used  to  estimate  the  DOAs  of  multiple  wire¬ 
less  signals  with  higher  accuracy  than  MUSIC  and  ESPRIT 
for  a  range  of  noise  levels.  The  results  were  obtained  for  10 
sources  using  two  antennas  for  the  DUET  algorithm  and  15 
antennas  for  the  competing  methods. 


SNR 

DUET 

ESPRIT 

MUSIC 

ML 

MN 

oo  dB 

100  % 

50.3  % 

79.5  % 

98.1  % 

72.5  % 

20  dB 

99.4  % 

49.5  % 

78.9  % 

~92A  % 

71.7  % 

15  dB 

97.3  % 

49.7  % 

79.0  % 

85.3  % 

71.7  % 

10  dB 

75.8  % 

49.6  % 

79.0  % 

68.8  % 

71.6  % 

Table  2:  Percentage  of  estimates  with  an  absolute  error  less 
than  0.5  degree.  Two  antennas  were  used  for  DUET,  15  an¬ 
tennas  for  other  algorithms.  For  example,  97.3%  of  DUET’s 
estimates  were  within  .5  degree  of  the  true  angle  of  arrival  in 
the  15  dB  case.  The  table  shows  that  DUET  has  the  best  per¬ 
formance  for  oo  dB,  20  dB,  and  15  dB  SNR.  At  10  dB  SNR, 
DUET  has  slightly  worse  performance  than  MUSIC. 
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ABSTRACT 

Blind  Source  Separation  is  now  a  well  known  problem. 
When  a  priori  informations  about  the  propagation  or 
the  geometry  of  the  array  are  not  available,  the  model 
can  be  generalized  to  a  blind  source  separation  model. 
II  supposes  the  statistical  independence  of  the  sources 
and  their  non-gaussianity.  In  this  paper,  we  focus  on 
an  algorithm,  called  Canonical  Correlation  Analysis, 
based  on  the  use  of  second  order  statistics. 

1.  CANONICAL  CORRELATION 
ANALYSIS 

The  Canonical  Correlation  Analysis  is  a  method  of 
treatement  which  allows  to  study  the  correlation  be¬ 
tween  two  sets  of  data. 

We  can  have  one  set  of  data  k  fonction  of  the  ob¬ 
served  signals. 

k  =  9  M  (1) 

In  blind  source  separation,  and  in  particular  when 
we  are  interested  in  anti-jamming  processing,  we  divide 
the  received  signals  into  source  signals  and  noise  sig¬ 
nals.  The  second  set  of  data  k  is  get  from  the  observed 
signals  x  of  the  antenna.  This  processing  is  selected  to 
keep  the  signal  of  interest  : 

X  =  X interest  “b  X noise 
k  —  kinterest  ~b  ^notse 

Rxk  =  E  [xfc"]  =  E  [xinter„t/i;inter„tw]  (2) 

The  Canonical  Correlation  Analysis  can  be  divided 
in  several  steps. 

The  first  step  is  to  write  the  two  whitened  sets  of 
data  : 

5*  =  R~xmx  (3) 

Efc  =  Rkl/2k 


with  :  E[2xXi“]  =  RJl/2RxkR~1/2 
We  can  find  the  eigenvalues  of  this  matrix  : 


E  [EXE"]  =  UY?Vh 

(4) 

If  we  develop  two  new  matrix  : 

a  =  UhEx 

(5) 

and 

0=  V“Ek 

(6) 

then  we  can  say  that  a  has  all  of  the  information 
on  Ek  which  can  be  obtained  from  Sx  and  mutually,  0 
has  all  of  the  information  on  Sx  which  can  be  obtained 
from  Ejt- 

So  we  resolve  E  \c*PH\  =  E2  under  :  E  [aaHj  = 
I=E[00h]. 

Suppose  that  we  have  two  sets  of  data  x  and  k  avail¬ 
able. 

The  Canonical  Correlation  Analysis  consists  in  defin¬ 
ing  two  matrix  Wx  and  Wk  in  order  to  W^x  and  W^k 
must  be  the  more  correlated. 

So  we  have  : 

a  =  Wx  x  (7) 

0  =  W?k 


The  Canonical  Correlation  Analysis  minimizes  the 
criterion  : 


$  ( Wk ,  Wx)  =  E  [\WkHk  -  WxHx|2] 

(8) 

under  : 

WkHRkWk  =  1 

(9) 

and 

WXHRXWX  =  1 

(10) 
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The  minimization  can  be  written  : 


2.1.  SOBI 


$(Wfc,Wx)  =  trace 

^  W”RkWk  +  W«RXWX~  1 
W»RkxWx  -  WxHRxhWk  ]11J 

To  minimize  $  (Wk,  Wx),  we  must  derive  from  each 
component  of  Wk,  Wx  and  use  Lagrange  operations  A 
et  A  which  are  not  discussed  in  our  paper.. 

We  have  now  two  equations  : 

RxkWk  =  RXWX  A  (12) 

RkxWx  =  RkWk  A  (13) 

We  modify  the  equation,  (multiplication  by  RkxRkx) 
we  can  see  : 

RxkWk  =  RxR;x'RkxWx  A  (14) 

=  RxRkxRkWk  AA 

If  we  are  interested  in  Wx  ,  we  can  have  the  dual 
equation. 

If  we  call  T  =  A  A,  then  : 

RklRkxRZlR*kWk  =  WkT  (15) 

If  we  multiply  by  R XJ2 ,  we  have  the  equation  : 


If  the  second  sets  of  data  can  be  deduced  from  the  first 
ones  with  the  addition  of  a  delay  on  the  signal  : 

k  =  As(t  —  t)  (19) 

and 

x  —  As(t)  (20) 

We  can  write  :  Rx  =  Rk  =  UT?UH . 

We  can  have  : 

Hx  =  R-'l*x=Vs{t)  (21) 

Ek  =  R;l/2k=Vs(t-r) 

with  :  Rx1/2  =  R~'/2  =  Y;~1Uli 

We  have  also  : 

E[Zx~kH]  =VR,(t)Vh  (22) 

To  specify  V,  Belouchrani  in  SOBI  [1]  choose  to 
make  a  joint  diagonalization  of  a  set  of  matrix  using 
second  order  statistics.  This  approach  can  be  compared 
with  the  Cardoso  and  Souloumiac  method  for  the  Jade 
algorithm  [2]. 

The  estimated  V  allows  to  form  the  estimated  mix¬ 
ing  matrix  A  : 

A  =  RyJ2V  (23) 

and  the  estimated  outputs  are  : 

s(t)  =  VHR~1'2x  (24) 


Rk1/2RkxRx1/2Rxl/2RxkRk1/2Rl/2Wk  =  Rxk,2WkT 

(16) 

With  R~kl,2RkxR~l/2  =  D,  then  : 

DDHWk  =  WkT  (17) 

with  Wk  =  Rk2Wk 

So  we  can  find  the  eigenvalues  and  eigenvectors  of 
D.  We  choose  the  L  eigenvectors  U\  corresponding  to 
L  higher  eigenvalues. 

The  matrix  Wk  is  now  : 

Wk  =  Rkl^Ux  (18) 


2.2.  Non-circular  Source  Separation 


In  our  case,  the  signals  (BPSK)  are  non-circular  and 
the  noise  signals  are  circular.  If  the  interference  signals 
are  j(t)  and  the  BPSK  are  s(t) ,  we  can  see  that  : 


J5[S(<)2]  ^0 

(25) 

and 

E[j{t)2]  =0 

(26) 

The  signals  (BPSK)  are  non-circular,  if  we  want  to 

eliminate  the  circular  interferences, 
the  conjugate  of  x(t)  : 

we  can  use  for  k(t) 

k(t )  =  x(t)* 

(27) 

The  argument  is  the  same  if  we  are  interested  in  the 
matrix  Wx. 


The  model  is  always  x  =  As  : 

k(t )  =  A*s(f)* 


(28) 


2.  APPLICATIONS 

We  take  the  model  without  noise  :  x  =  As.  The  matrix 
A  have  the  SVD  :  A  =  UT,V 


We  look  at  the  matrix  A  which  can  be  divided  in 


eigenvalues  and  we  can  write  the  conjugate  A  noted 
A*  : 


A *  =  U*  TV 


(29) 
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The  correlation  matrix  of  the  source  signals  is  : 


RX  =  E  [x(t)x(t)H]  =  UT,2Uh  (30) 

Now  the  correlation  matrix  of  the  set  of  data  k(t) 
can  be  written  : 

Rk=E  [fc(f)fc(t)H]  =  E  [x(t)*fc(t)T]  =  U"Z2UT 

(31) 

If  we  have  the  whitened  sets  of  data  : 

Ex  -  fZ;1/2x  =  E_1C/"i  =  Vs(t)  (32) 

Efc  =  R^1/2k  =  Z-1UTx  =  V*s*(t) 

with  : 

=  (33) 

and 

i?~1/2  =  E  ~lUT  (34) 

We  have  : 

E[ExEkH]  =  VE[ssT]VT  (35) 

The  matrix  £[ssT]  only  contains  the  informations 
on  non-circular  signals.  The  SVD  factoring  of  £jssT] 
allows  to  estimate  V  and  to  find  only  the  non-circular 
signals  of  the  mixing. 

Th  estimation  of  V  allows  to  have  the  estimated 
mixing  matrix  A  : 

A  =  RlJ2V  (36) 

This  research  of  the  mixing  matrix  can  be  qualified 
a  blind  separation  because  no  information  on  antenna, 
on  propagation  or  on  signals  is  necessary  to  have  the 
’filter’.  The  noise  signals  must  be  circular  to  be  rejected 
by  this  algorithm  [3]. 

One  of  the  applications  of  this  algorithm  is  the  sub¬ 
ject  of  a  patent  registered  with  Thomson-CSF. 

3.  RESULTS 
3.1.  Adaptive  Antenna 

The  antenna  is  an  MSLC  (Multiple  Sidelobe  Canceller) 
antenna,  which  means  that  we  can  have  one  main  an¬ 
tenna  and  some  auxiliary  elements.  Indeed,  for  the 
supervision  of  some  particular  space  aeras,  we  use  this 
kind  of  antenna  which  allows  to  focus  on  the  main  an¬ 
tenna  the  information  on  the  source  signal  while  super¬ 
vising  areas  likely  to  have  some  jamming  signals. 

If  we  consider  the  sectional  elevation,  we  can  rec¬ 
ognize  on  figure  1  the  main  antenna  and  an  auxiliary 
element.  The  main  antenna  has  a  constant  value  (3-4 


Figure  1:  MSLC  Antenna 

dB)  between  0°and  l°(angle  of  sight)  and  the  auxiliary 
element  has  some  variations  between  these  angles  of 
sight. 

The  source  signal  and  the  jamming  sources  are  : 

-  1  source  signal  located  at  0°  (angle  of  sight), 
0°  (yaw  angle)  and  with  power  20 dB. 

-  2  gaussian  jammers  located  one  at  0°  (angle  of 
sight)  and  1.5°  (yaw  angle)  with  power  20 dB  and 
other  at  1.7°  (angle  of  sight)  and  0°  (yaw  angle) 
with  power  20dB. 

-  gaussian  noise  with  power  0 dB. 

This  kind  of  situation  is  good  for  the  classical  treate- 
ment  (anti-jamming  with  MSLC).  The  source  signal  is 
located  in  the  sidelobe  of  the  main  antenna  and  the 
two  jammers  are  located  in  the  middle  of  the  auxiliary 
elements.  Now  we  cam  compare  the  performances  of 
the  different  algorithms. 

3.2.  Performances 

It’s  necessary  to  study  the  performances  of  the  two  al¬ 
gorithms  (MSLC  and  Canonical  Correlation  Analysis) 
when  one  of  the  jammers  will  move  and  its  power  will 
change. 

When  one  of  the  jammers  moves,  the  performances 
of  the  differents  algorithms  can  be  evaluated  : 

-  -  keeping  the  two  others  fixed. 

-  changing  the  last  jammer  initially  located  at 
0°  (yaw  angle)  along  the  decreasing  angles 
of  sight  (variation  from  1.7°  to  0°). 
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The  power  of  the  moving  jammer  is  fixed  to  20 dB 
and  increases  to  50 dB. 

-  MSLC  treatement 

On  figure  2,  lots  of  elements  allow  us  to  verify  : 

-  -  the  Signal  to  Noise  and  Interferences  Ra¬ 

tio  (SINK)  becomes  weak  when  the  jammer 
comes  near  the  source  signal. 

-  the  SINR  becomes  weak  when  the  power  of 
the  jammer  decreases. 

These  two  observations  are  easily  explainable. 

For  the  first  one,  this  is  inevitable  whichever  algo¬ 
rithms  we  use.  This  waste  of  SINR  is  predictable  :  if 
we  look  at  the  figure  1 ,  we  can  see  the  jammer  entering 
in  the  main  antenna  from  1°  (angle  of  sight),  and  the 
SINR  variation  follows  the  auxiliary  element. 

The  second  observation  is  the  result  of  the  self¬ 
jamming  of  the  source  signal.  In  fact,  when  the 
power  of  the  jammer  signal  is  weak,  the  MLSC  algo¬ 
rithm  takes  the  source  signal  as  a  jammer  and  it  tries 
to  eliminate  it.  That  is  why  we  call  this,  self-jamming. 


Figure  2:  SINR  with  MSLC  Algorithm 

-  Canonical  Correlation  Analysis 

If  we  make  the  same  experience  with  the  Canonical 
Correlation  Analysis,  we  can  see  on  figure  3  that  the 
Signal  to  Noise  and  Interferences  Ration  does  not  de¬ 
pend  on  the  power  of  the  jammer.  The  self-jamming  of 
the  source  signal  has  disapeared.  Whichever  the  jam¬ 
mer  power,  the  SINR  only  depends  on  the  auxiliary 
element. 


Figure  3:  SINR  with  Canonical  Correlation  Analysis 

4.  CONCLUSION 

If  we  can  have  specific  information  on  the  source  sig¬ 
nals,  it  is  better  to  use  methods  only  based  on  the  use 
of  second-order  statistics.  These  methods  are  less  ex¬ 
pansive  for  the  calculation  than  higher  order  statistics. 

The  BPSK  signals  are  non-circular  while  the  jam¬ 
mer  sources  are  gaussian  and  circular.  For  Separation 
Sources,  the  best  technique  is  the  Canonical  Correla¬ 
tion  Analysis,  the  results  show  that  this  method  is  ef¬ 
fective  to  avoid  the  self-jamming  of  the  source  signal. 

In  the  case  of  a  MSLC  antenna,  the  performances 
with  Correlation  Canonical  Analysis  are  the  same  as 
these  with  JADE  algorithm  [2]  using  higher-order  statis¬ 
tics  [3]. 
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ABSTRACT 

In  this  work,  we  are  interested  in  the  separation  of  N 
propagating  source  signals  recorded  simultaneously  by 
a  set  of  receivers.  To  solve  this  “cocktail-party  prob¬ 
lem”  ,  we  propose  to  collect  a  set  of  spatially  close  mi¬ 
crophones.  On  each  sensor,  signals  are  received  with 
the  same  attenuation  but  with  different  time  delays. 
The  linear  memoryless  conventional  model  for  source 
separation  is  then  no  more  suitable.  However,  when 
time  delays  are  small  in  comparison  with  the  coher¬ 
ence  time  of  each  source,  we  show  that  this  problem 
can  be  simplified  building  up  a  particular  set  of  instan¬ 
taneous  mixtures  involving  derivatives  of  sources  with 
respect  to  time.  Then,  sources  can  be  extracted  using 
second-order  methods.  The  limitations  of  the  method 
are  exposed  :  we  explain  what  a  small  delay  is  and  we 
show  that  the  number  of  sources  can’t  exceed  3.  The 
validity  of  the  proposed  approach  is  confirmed  by  com¬ 
puter  simulations.  Finally,  we  apply  our  method  to 
an  experiment  where  two  source  signals  are  extracted 
from  their  mixtures  observed  with  two  omnidirectional 
microphones  in  a  normal  room. 


of  observations  than  the  number  of  sources. 

Because  of  the  proximity  of  sensors,  we  assume  that 
the  contribution  of  each  source  received  by  sensors  are 
the  same  except  a  relative  propagation  delay  from  one 
sensor  to  the  other. 

Denoting  ys  =  [ysi(t),  yS2(t),  ■  ■  ■  ,VsN(t)]T  the  ob¬ 
servation  vector  and  x  =  [x\  ( t ) ,  (t) , . . . ,  xsn  (f)]T,  the 
sources  contribution  vector,  we  can  write: 

y8i(t)  =  xi(t)  +  x2(t)  +  . . .  +  xN{t) 

}=N 

Vsiit)  =  ^  ^  Xj (t  —  Titj),  i  —  2, . . .  ,  iV, 

j=i 

(1) 

where  ntj  represents  the  relative  delay  of  source  Xj(t) 
observed  on  the  ith  sensor  versus  the  first  observation 
ysi(t). 

Let’s  consider  a  contribution  xJt—Tij)  from  system 
(1).  Its  Fourier  Transform  (FT)  is: 

FT [xj(t  -  Tjj)]  =  Xj{u)e-2^VTi-> , 


1.  BASIC  ASSUMPTIONS  AND  MODEL  where  v  represents  the  frequency  variable. 


Let  consider  N  sources  assumed  to  be  statistically  in¬ 
dependent,  localized  and  differently  colored,  propagat¬ 
ing  in  an  echo-free  environment.  These  sources  are 
recorded  by  a  set  of  M  sensors  spatially  close  one  to 
the  others.  We  also  assume  that  sources  are  far  from 
sensors,  so  the  propagation  model  can  be  approximate 
as  a  far  field  model  and  the  power  of  the  contribution 
of  one  source  on  each  sensor  is  the  same. 

In  this  work,  the  presence  of  additive  noise  on  ob¬ 
servations  will  not  be  treated.  For  simplicity,  we  will 
expose  only  the  case  where  there  is  the  same  number 


The  Taylor  expansion  of  e  2i7r“Ti'J'  is  Ylk- o° 

In  a  physical  context,  the  sensors  have  a  given  band¬ 
width  [l'min,Vmax]-  Let’s  denote  Um  (vm  <  Vmax)  the 
maximum  frequency  such  as  Xj{v)  0  for  all  j. 

In  case  of  compact  sensor  array,  we  assume  that  the 
delays  are  slight  such  as: 
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Then,  in  the  previous  Taylor  expansion  we  neglect 
terms  higher  than  order  one. 

So,  we  can  give  the  following  approximation  for  the 
observations  yai(t),  i  =  2...N: 


ysi(t)  ss  xi(t)  -  + 

x2 (t)  -  TU2±2{t)  + 

...  + 

XN(t)  ~  TitNXN(t). 

(2) 

where  Xi(t)  expresses  the  first  derivative  of  Xi(t). 
Introducing  y\{t)  =  yn  (t)  the  first  derivative  of  first 
observation,  and  denoting  j/j(t)  =  yr(t)  -  ysi(t),  with 
i  —  2 . . .  N,  we  obtain  the  following  system: 


i/i (f)  =  xi (t)  +  . . .  +  xN(t) 

l 'nit)  «  Ti'iXi(t)  +  ...  +  Ti,NxN(t),  i  =  2...  N 

(3) 

which  can  be  rewritten  in  vector  and  matrix  nota¬ 
tions  as: 


y(t) 

=  Mx(i), 

(4) 

with  M  = 

l  ... 

72,1  •  •  • 

1  ... 

T2,j  •  •  • 

1  ■ 

t2  ,n 

tn,i  ■  ■  ■ 

TNJ  •  •  • 

tn,n 

The  slightly  delayed  mixture  appears  now  as  an  in¬ 
stantaneous  mixture  of  derivatives  sources. 

In  (4),  M  is  the  unknown  memoryless  mixing  ma¬ 
trix.  M  is  a  N  x  N  matrix  assumed  to  be  full  column 
rank  (this  assumption  will  be  discussed  later). 

Remark: 


If  we  have  one  sensor  more  than  sources,  the  deriva¬ 
tion  of  the  reference  observation  can  be  avoided.  In 
this  case  the  mixing  matrix  becomes  : 


M  = 


T2,l 


tn+ 1,1 


r2J  T2,N 


TN+l,j  ■■■  TJV+1,JV+1 


2.  IDENTIFICATION  OF 
INSTANTANEOUS  LINEAR  MIXTURES 

Because  of  the  spectra  differences  of  sources,  the  prob¬ 
lem  can  be  solved  by  any  classical  blind  identification 


method  for  instantaneous  mixtures  using  second-order 
statistics  of  the  observations  (see  Tong’s  AMUSE  [1] 
[2],  SOBI  [3]  ,  IMISO  [5]  or  [4]  . . .  ). 

The  blind  identification  problem  consists  in  esti¬ 
mating  a  separating  matrix  S  such  as:  SM  =  DP, 
where  D  is  a  regular  diagonal  matrix,  P  is  a  permuta¬ 
tion  matrix. 

The  product  of  S  with  the  observations  leads  to: 
z  (t)  =  DPx(t), 

representing  the  sources  derivatives  except  for  one  per¬ 
mutation  and  a  scaling  factor. 

Most  of  second  order  methods  are  based  on  the  di- 
agonalization  of  two  differently  linearly  filtered  covari¬ 
ance  matrices  of  the  observations. 

Consider  the  spatial  covariance  matrix  of  the  obser¬ 
vations  for  any  delay  r  :  Ryy(r)  =  E[y(t)yT(t  +  r)]. 
From  (1),  we  can  write  a  relation  between  Ryy(r)  and 
Rxx(t),  the  spatial  covariance  matrix  of  the  derivative 
of  sources  : 


Ryy(r)  =  MR**(r)MT  (5) 


Mutual  independence  of  sources  implies  Rxx(t)  to  be 
a  diagonal  matrix. 

Let’s  linearly  filter  with  the  impulse  response  h(t)  each 
member  of  expression  (5): 


(h*Ryy)(r)  =  (h*  [MRiiMT])(r). 
Because  the  convolution  product  is  linear,  it  comes: 


Rft(r)  =M[D,l(r)]MT,  (6) 

where R'l(r)  —  ^/i*Ryy) (r)  and  Dh(r)  =  (h*R xxj(r). 

Because  matrix  Ryy(0)  is  regular,  we  can  introduce 
the  following  matrix: 


R  =  [Rvv(0)]  V(0). 


Then  from  (5)  and  (6),  it  comes: 


R=[MT]  1  [Rii(O)]  1  [Dft  (0)]  Mr. 

We  can  show  that  (Mr)  1  can  be  estimated  except  for 
one  diagonal  matrix  and  one  permutation  matrix  from 
the  eigenvector  matrix  of  R. 
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3.  LIMITATION  OF  THE  METHOD 

In  previous  section  we  assumed  the  matrix  M  to  be 
full  column  rank.  Under  the  assumptions  made  on  the 
field  of  sources,  the  relative  delays  ry  only  depend  on 
the  distance  between  reference  sensor  and  ith  sensor  as 
illustrated  in  Figure  1: 


Figure  1:  Source  (t j  plane  wave 

where  Vj  is  the  unit  vector  of  the  plane  wave  di¬ 
rection  of  the  jth  source  and  ui;  is  the  position  vector 
of  the  ith  sensor  versus  sensor  reference.  The  relative 
delay  ry  is  given  by  the  scalar  product: 

Tij  =  -u  liVj,  (7) 

where  c  is  the  propagation  velocity  supposed  to  be  con¬ 
stant. 

It  follows  that  in  the  three  dimensional  physical 
space: 

M  =  i[ufivj],  i  =  2,...,N;  j  =  l,...,N; 

is  a  non  regular  matrix  for  N  >  3.  In  other  words,  the 
identification  of  such  mixture  is  not  possible  for  more 
than  three  sources. 

4.  APPLICATIONS 

We  present  results  obtained  choosing  h(t)  such  as  a 
second-order  differentiator  filter  [5]. 


4.1.  Numerical  Simulations 

Three  unit-power  synthetic  signals  are  mixed  with  small 
delays  (rmQX  «C  (v/27r^mM)_1  where  vmax  is  the  maxi¬ 
mum  frequency  of  the  observations). 


For  the  particular  case  presented  here  (see  the  top  of 
Figure  2),  the  synthetic  data  are  obtained  by  bandpass 
FIR  filtering  of  white  signals.  From  the  spatial  location 
of  sensors  and  directions  of  arrival  of  sources  we  deduct 
the  relative  delays  using  equation  (7).  The  simulated 
waves  are  propagating  in  the  air  with  a  velocity  equal 
to  340  m  sec-1. 

The  observations  are  constructed  from  delayed  sources 
using  spectral  interpolation.  The  maximum  frequency 
generated  is  umax  =  5kHz  for  a  11  kHz  sample  rate, 
and  the  maximum  of  delays  generated  is  Tmax  =  21  ps 
with  a  3  cm  distance  inter-sensors. 

On  the  bottom  of  Figure  2  we  plot  the  PSD  of  two  of 
the  three  observations  in  order  to  illustrate  that  the 
Power  Spectral  Densities  of  observations  are  identical. 

The  power  spectral  densities  of  estimated  sources 
are  plotted  in  Figure  3. 

The  performance  of  our  method  is  measured  using 
the  criterion  introduced  by  Shobben  and  al  in  [7].  The 
quality  of  separation  of  the  jth  separated  output  is  de¬ 
fined  as: 


Sj  =  10  log 


Ve  [(z*jzJtX()y 


where  Zj<Xi  is  the  jth  output  when  only  xi  is  active. 

For  our  numerical  experiments  the  performance  mea¬ 
sures  can  be  found  in  the  following  table: 


Estimated 

Estimated 

Estimated 


Source  #1 
Source  #2 
Source  #3 


51  =  17dB 

52  =  9dB 
Sz  =  27dB 


Better  results  could  be  obtained  using  spatially  closer 
sensors  but  it  conducts  to  non  feasible  configurations. 

4.2.  Real  Data 

We  test  the  method  on  real  signals  recorded  by  J.T.Ngo 
et  al.  [6].  The  signals  are  obtained  by  two  omnidirec¬ 
tional  microphones  mounted  1cm  apart,  recording  two 
human  speakers,  each  lm  away  from  sensors.  A  sample 
rate  of  22050  Hz  was  used  for  each  signal.  A  compari¬ 
son  with  Ngo  et  al  results  is  plotted  of  the  bottom  side 
of  Figure  4. 

5.  EXTENSIONS  -  CONCLUSIONS 

We  showed  that  second-order  Blind  Separation  algo¬ 
rithms  can  be  used  to  extract  propagating  colored  sources 
recorded  on  a  compact  set  of  sensors.  We  have  seen 
that  when  the  source  contributions  are  recorded  with 
the  same  attenuation  on  each  sensor,  the  method  is 
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geometrically  limited  to  the  extraction  of  only  three 
sources.  When  the  attenuations  are  different  from  one 
source  to  each  sensor,  this  limitation  vanishes. 

The  case  of  noise  corrupted  sensors  requires  more 
sensors  than  observations  and  can  be  treated  by  clas¬ 
sical  method  as  exposed  in  [2]. 

The  method  can  be  extended  for  higher  delays  im¬ 
plying  second  order  development  in  Taylor  series.  Some 
modifications  of  the  second  order  identification  method 
are  necessary. 


Figure  2:  Synthetic  Data:  sources  and  observations 


AM 

1  1  PSD  of  estimated  Source  #  I 

Figure  3:  Synthetic  Data:  estimations 


Figure  4:  Real  Data 
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ABSTRACT 

Robustness  against  deviations  from  nominal  source  pdf 
assumptions  is  very  desirable  in  blind  source  separation 
(BSS)  algorithms.  In  this  paper,  a  new  approach  for  ro¬ 
bust  BSS  is  proposed.  We  modify  the  EASI  (equivari- 
ant  adaptive  separation  by  independence)  algorithms 
to  use  ranks  of  observed  signals.  Two  different  meth¬ 
ods  for  evaluation  of  ranks  have  been  introduced  in 
this  paper.  Our  modified  algorithm  can  be  applied  to 
both  real- valued  data  and  complex-valued  data.  Design 
guidelines  are  discussed  for  the  nonlinear  rank  weight¬ 
ing  functions  in  the  modified  algorithm.  Simulation  re¬ 
sults  and  some  examples  are  given,  showing  very  good 
performance. 


nonlinearities  was  also  studied  in  [4].  The  approach  us¬ 
ing  ranks  to  improve  the  robustness  of  BSS  algorithms 
was  first  proposed  in  [5],  where  the  EASI  algorithms 
were  modified  to  use  ranks  of  observed  signals.  Simu¬ 
lation  results  in  [4],  [5]  and  this  paper  show  that  the 
EASI  algorithms  fail  in  estimating  the  mixing  chan¬ 
nel  when  there  are  deviations  from  nominal  source  pdf 
assumptions. 

In  this  paper,  two  ranking  methods  are  introduced. 
Our  method  using  ranks  to  achieve  the  robustness  of 
the  EASI  algorithms  can  be  applied  to  real- valued  data 
or  complex-valued  data.  Simulation  results  show  good 
performance  with  ranks  in  BSS. 

2.  BLIND  SOURCE  SEPARATION 


1.  INTRODUCTION 

Blind  Source  Separation  is  the  process  of  recovering  a 
set  of  independent  signals  when  only  mixtures  with  un¬ 
known  coefficients  are  observed.  It  is  usually  assumed 
that  little  is  known  about  the  original  sources  except 
that  they  are  mutually  independent.  Many  important 
theories  and  applications  have  been  investigated  in  BSS 
and  more  generally  in  Independent  Component  Analy¬ 
sis  (ICA)  [1],  [2],  [3].  However,  little  has  been  done  on 
the  robustness  issue  in  BSS.  Robustness  against  devia¬ 
tions  from  nominal  source  probability  density  function 
(pdf)  assumptions  is  very  desirable  in  BSS  algorithms. 
Some  aspects  of  performance  approximation,  and  the 
robustness  of  the  EASI  (equivariant  adaptive  separa¬ 
tion  by  independence)  algorithms  were  considered  in  [4] 
and  [5] .  It  was  shown  by  Cardoso  and  Laheld  in  [2]  that 
the  optimum  nonlinear  function  in  the  EASI  algorithms 
depends  on  the  pdf’s  of  the  original  sources.  Therefore, 
the  performance  of  the  original  algorithms  is  affected 
by  the  accuracy  of  our  knowledge  on  source  densities. 
The  robustness  of  these  BSS  algorithms  was  achieved 
in  [4]  by  using  saturating  nonlinear  functions  in  the 
original  algorithms.  The  nature  of  optimum  quantizer 


The  block  diagram  in  Fig.  1  shows  a  general  adaptive 
BSS  scheme  for  the  standard  model  of  instantaneous 
additive  sources. 


Source 

signals 

Received 

signals 

V . 

Estimated 

Mixing 

matrix 

A 

Separating 

matrix 

signals 

s[n] 

x[n] 

y[n] 

Y_ 

Figure  1:  Adaptive  BSS  of  Instantaneous  Additive 
Mixtures 

The  received  discrete-time  signal  model  is  that  of  an 
m-dimensional  time  series  x[n]  =  (a:i[n]  •  •  •  x,’[n]  •  •  •  xm[n\)T 
of  the  form: 

x[n]  =  As[n]  (1) 

where 

s[n]  =  (  si[n]  Si [n]  •••  Sfc[r?]  )T 
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The  channel  characteristic  between  s[n]  and  x[n]  is 
defined  by  the  constant  mixing  matrix  A  of  size  rax  k\ 
there  are  k  sources  and  m  receivers.  Here  we  require 
m>  k,  which  means  the  number  of  receivers  should  be 
no  less  than  the  number  of  sources. 

The  objective  is  to  get  a  separating  matrix  such 
that  y(n]  consists  of  individually  scaled  and  possibly 
permuted  versions  of  s[n]: 


y[n]  =  B„x[n]  =  B„As[n] 

=  C„s[n]  (2) 


where 

Cn  d=  B„A  (3) 


The  matrix  B„  is  called  the  separating  matrix  after 
the  n-th  iteration.  Ideally,  for  n  large  enough,  Bn  has 
converged  to  a  matrix  B,  and  C  =  BA  is  very  close  to 
an  identity  matrix;  more  generally,  C  is  a  permutation 
matrix  with  arbitrary  scaling  for  each  output. 

The  normalized  Equivariant  Adaptive  Source  Sep¬ 
aration  (EASI)  algorithm[2]  that  we  base  our  modifi¬ 
cations  on  has  the  form: 


B„+i 


B„-A 


n 


y[n]y[nf  -  I 
1  +  A„yT[n]y[n] 


g(y  M)yTM  -  yMgT(yM) 

l  +  A„|yTMg(y[n])| 


Bn  (4) 


where  A„  is  the  adaptation  step  size,  and  g()  is  a 
component-wise  nonlinear  odd  function  for  which  de¬ 
sign  guidelines  are  available.  The  term  (y[n]y[n]T  —  I) 
in  (4)  has  the  effect  of  driving  the  diagonal  elements  of 
C„  to  all  ones.  Meanwhile,  the  other  term  (g(y[u])yT[n] 
— y[n]gT(y[n]))  in  (4)  drives  the  off-diagonal  elements 
of  Cn  to  zeros.  Another  version  of  the  EASI  algorithm 
is  called  the  original  EASI  algorithm,  which  does  not 
have  the  normalization  factors  (1  +  A0y[n]  y[n])  and 
(1  +  Ao|y[n]Tg(y[n])|).  The  two  normalization  factors 
in  the  normalized  EASI  algorithm  in  (4)  were  intro¬ 
duced  to  improve  the  stability  of  the  algorithm.  Our 
simulations  in  section  5  show  the  performance  of  the 
original/normalized  EASI  algorithms. 

Assuming  all  the  original  sources  are  identically  dis- 
tribued  with  a  differentiable  probability  density  func¬ 
tion  f(s),  the  optimum  g  function  in  (4)  is  given  in  [2] 
as 


9opt(s ) 


9Lo(s ) 

EgLo2{s)  -  1 


(5) 


where 


9Lo(s)  Hf 


-/'(») 

/(«) 


(6) 


3.  DEFINITIONS  OF  RANKS 

Let  us  define  function  S\  as  follows: 

{1  v  >  0 

0  v  =  0  (7) 

—  1  v  <  0 

Let  Ri[n]  denote  the  normalized  marginal  rank  vector 
of  y[?i]  based  on  {y[n],  •  ■  •,  y[n  +  L  —  1]},  defined  in  [6], 

RiM  =  Y  ^(yW-yW)  (8) 

i=n 

Here  ,Sj  is  applied  component-wise,  and  L  is  the  num¬ 
ber  of  samples  chosen  to  compute  the  normalized  marginal 
ranks  of  y[n]. 

The  normalized  marginal  ranks  defined  in  (8)  are 
only  for  real-valued  data.  However,  our  second  defini¬ 
tion  of  ranks  can  be  applied  to  both  real-valued  and 
complex-valued  data.  Let  us  first  define: 

=  {J  i-<l  <»> 

and 

u 

sign(u)  =  —  (10) 

M 

Define  the  normalized  signed-rank  vector  R2[u]  of  y[n] 
based  on  {y[n],  •  •  -  ,y[n  +  L  —  1]}  to  be: 

R2[n]  =  s«>n(y[n]){i  ^  52(|y[n]|  -  |y[*]|)}  (11) 

i—n 

where  and  sign(-)  are  applied  component-wise,  and 
L  is  the  number  of  samples  chosen  to  compute  the  rank 
vector  of  y[n]. 

We  note  that  the  ranks  defined  in  (8)  are  the  nor¬ 
malized  and  centered  version  of  the  traditional  ranks 
which  are  in  the  range  {1,  For  simplicity,  let 

us  assume  we  have  a  set  of  real  values  {ari, •••,*,&}.  As¬ 
suming  Xj1  <  xj2  <  ■  ■  ■  <  XjL  with  1,2, 

•  •  • ,  L},  the  traditional  method  ranks  Xjx ,  Xj2,  ■  ■  ■ ,  XjL 
with  intergers  1,2,  ■■■,L  respectively.  Denote  r<  with 
i  =  1,  •  •  • ,  L  as  the  rank  of  X{  in  the  traditional  defini¬ 
tion;  let  Ru  denote  the  normalized  marginal  rank  of 
Xi  defined  in  this  paper.  The  explicit  definition  of  r,-  is 
given  as  follows: 


L 

n  =  'Yl,  S2(Xi  ~ 

j~  1 


(12) 
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where  S2O  is  defined  in  (9).  It  is  easy  to  see  that 


Rii 


2 rt  -  (L  +  1) 
L-  1 


(13) 


with  r;  e  {1,2,  ••■,£}  and  R{  €  {-1, -£zf ,  •  •  • ,  frf  >  !}• 
Assuming  that  we  have  a  set  of  numbers  {21,  •  ■  • ,  zl}, 
let  Zi  =  a,ej9‘  with  i  =  1,  •  •  • ,  L,  where  a,-  and  #,■  are  the 
amplitude  and  phase  of  z,  respectively.  Let  us  denote 
rai  with  i  =  1,  •  •  • ,  L  as  the  rank  of  a ,•  in  the  traditional 
definition  based  on  ai,  •  •  ■ ,  at;  let  f?2i  denote  the  nor¬ 
malized  signed-rank  of  2,-  based  on  {2!,  •  •  • ,  2^},  as  de¬ 
fined  in  (11).  Thus  we  can  see  that 

R2i  =  ~raisign(zi)  (14) 

Therefore,  in  the  above  equation,  the  phase  information 
of  each  Z{  is  retained  in  the  term  sign(zi)  along  with  the 
traditional  rank  representation  for  its  modulus.  The 
factor  2  is  used  to  obtain  a  normalized  signed  rank 
representation. 

4.  MODIFIED  ALGORITHM  USING 
RANKS 

Our  modified  algorithm  using  ranks  is 
B„+1  =  B„  —  B„  |a„  [y[n]y[n]T  -  I 

+/i„  [h(Ry  [n])RyT[n]  -  Ry[n]hT(Ry[n])j } 

(15) 


where  An  and  //„  are  adaptation  steps,  and  h(  )  is 
a  component-wise  nonlinear  odd  rank  weighting  func¬ 
tion;  Ry[n]  is  the  rank  vector  of  y[n]  based  on  {y[n],  •  •  •, 
y[n  +  L  —  1]}.  If  y[n]  is  real-valued,  Ry[rc]  could  be 
the  normalized  marginal  rank  vector  defined  in  (8)  or 
the  normalized  signed-rank  vector  defined  in  (11).  On 
the  other  hand,  if  y[n]  is  complex-valued,  Ry[n]  will 
be  normalized  signed-rank  vector  for  y[n]. 

If  the  components  of  y[n]  are  mutually  independent, 
then  Ry[n]  components  for  fixed  n  are  also  indepen¬ 
dent.  Assuming  Ryi[n]  and  h(Ryj[n])  are  the  ith  and 
the  jth  components  of  Ry[n]  and  h(Ry[nj),  then,  by 
the  independence  of  Ry<[n]  and  Ryj[n]  with  i  ^  j  and 
i,j  6  {1,  •  ■  • ,  m),  we  have  E(  R,ji[n]h(  Ryj[n])) 

=  E(flyl[n])E(fc(i2Fi[n])). 

Assuming  Ry[n]  is  the  normalized  marginal  rank 
vector,  then  each  Ryj[n]  with  j  =  1,  •  •  • ,  m  equals 
{L  +  1  —  2 i)/(L  —  1),  i  =  1,  •  •  •,  L  with  probability 
It  is  easy  to  see  in  this  case  that 


®(-^w’[n3)  — 


y-  L  +  1  -  2i 

h  L~l 
0 


L 


(16) 


Assume  Ry[n]  is  the  normalized  signed-rank  vector 
of  y[n]  based  on  |y[n],  •  •  •,  y[n  +  L  -  1]}.  Let  yj[n] 

—  1%'W \e*9j  be  the  j-th  component  of  y[n],  where  9j  is 
the  phase  of  yj[n].  If  the  real  part  and  the  imaginary 
part  of  yj  [n]  are  circularly  symmetric  in  their  joint  pdf, 
the  magnitude  |yj[n]|  and  the  phase  Oj  of  y,-[n]  are  inde¬ 
pendent  and  E(eiei)  =  0.  Denote  raj  as  the  traditional 
rank  of  \yj[n]\  based  on  {|j/,-[n]|,  •  •  •,  |y,-[n  +  L  -  1]|).  It 
is  easy  to  see  that  raj  is  also  independent  of  0j .  Thus 
we  have 


E(Aw-[n])  =  jE(raj  •  e^) 

=  jE(raj)E(e^) 

=  0  (17) 

Therefore,  in  both  cases  where  is  either  the  nor¬ 

malized  marginal  rank  or  the  normalized  signed  rank, 
E(Ryi[n]h{Ryj[n]))  -  0  with  i  ^  j  and  i,j£  {1,  •  •  • ,  m). 
By  forcing  this  condition  together  with  the  whitening 
condition  E[y[n]y[n]T]  =  I  in  the  algorithm,  it  is  possi¬ 
ble  to  drive  the  components  of  y[n]  to  be  independent. 

The  normalized  marginal  and  signed  ranks  defined 
in  (8)  and  (11)  respectively  can  greatly  increase  the  sta¬ 
bility  and  robustness  of  the  algorithm.  Therefore,  the 
modified  algorithm  in  (15)  does  not  need  the  normal¬ 
ization  factors,  at  least  in  the  second  term.  Simulations 
using  this  modified  algorithm  with  no  normalization  at 
all  also  show  stable  and  robust  performance. 

The  term  (h(Ry[n])RyT[n]  -  Ry[n]hT(Ry[n]))  in 
(15)  can  drive  the  off-diagonal  elements  of  C„  defined 
in  (3)  to  all  zeros.  However,  the  convergence  rate  of 
these  off-diagonal  elements  may  be  slower  using  ranks. 
Thus,  without  comprising  stability  in  (15),  [i„  may  be 
chosen  to  be  greater  than  A„  to  increase  the  conver¬ 
gence  rate  of  these  off-diagonal  elements  compared  to 
that  of  the  diagonal  elements. 

It  is  easy  to  see  that  the  normalized  signed  ranks 
defined  in  (11)  are  inside  or  on  the  unit  circle.  Thus 
the  amplitudes  of  the  original  data  have  been  largely 
compressed  to  avoid  the  “blow-up”  of  BSS  algorithms. 
However,  the  phase  information  has  been  retained  by 
our  definition  in  (11). 

5.  NONLINEAR  RANK  WEIGHTING 
FUNCTIONS 

We  can  choose  the  rank  weighting  function  h(R)  —  gLo(R) 
with  gLo(-)  defined  in  (6)  and  the  ranks  R  defined  in 
this  paper.  A  more  detailed  discussion  on  the  choice  of 
the  rank  weighting  functions  is  given  in  [5]. 

Consider  a  unit- variance  generalized  Gaussian  source 
pdf  of  the  form  f(s)  =  C(fc)e“c(*)l*l*  with  k  >  0,  where 
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c(k)  and  C(k)  are  two  constants  when  k  is  fixed.  Thus 
the  optimum  g  function  given  in  (5)  will  be  gLO  = 
a(k)\s\k~1sign(s)  where  a(k)  is  a  function  of  k.  Since 
a(k)  is  a  constant  for  any  fixed  k,  it  can  be  accom¬ 
modated  by  the  adaptation  step  size  / in  defined  in 
(15).  Therefore,  we  choose  h(R)  —  \R\k~1sign(R )  in 
this  case.  However,  our  simulations  show  that  multi¬ 
ple  choices  of  weighting  functions  are  applicable  and 
all  give  generally  good  performance.  More  generally, 
considering  the  class  of  generalized  Gaussian  signals, 
we  may  choose  h(R)  =  sign(R)\R\m  with  m  >  k  —  1 
for  k  >  2,  and  h(R)  =  sign(R)\R\m  with  m  <k  -  1  for 
k  <  2. 

Let  us  consider  the  case  when  the  received  signals 
are  complex- valued  data.  In  [2],  the  nonlinear  g  func¬ 
tions  are  restricted  to  be  of  the  form  g(y)  =  y  ■  /(|y|2) 
if  y  is  complex,  where  /(•)  is  a  real-valued  function. 
Since  the  normalized  signed  ranks  for  complex  num¬ 
bers  are  still  complex- valued,  we  propose  the  nonlin¬ 
ear  rank  weighting  functions  in  the  complex  case  to 
be  h(R)  =  g(R)-  Our  simulation  results  in  the  next 
section  show  good  performance  with  our  choice  of  non¬ 
linear  rank  weighting  functions. 

6.  SIMULATIONS 

We  present  here  representative  simulation  results  for 
the  algorithms  discussed  in  this  paper.  Performance 
comparisons  between  the  EASI  algorithms  and  our  mod¬ 
ified  algorithm  are  given  in  the  following  examples. 
Both  examples  have  two  sources  and  two  receivers.  The 
mixing  matrix  A  is  randomly  generated  for  both  exam¬ 
ples.  In  each  example,  the  orginal/normalized  EASI 
algorithm  and  the  modified  algorithm  are  run  for  the 
same  number  of  iterations  and  for  the  same  set  of  sig¬ 
nals.  Fig.  3  and  Fig.  5  show  the  elements  of  the  C„ 
defined  in  (3)  as  a  function  of  the  number  of  iterarions. 

Example  1 

In  our  simulation  for  Fig.  3,  the  two  sources  are  two 
16QAM  sequences.  The  original  source  symbols  are 
contaminated  by  heavy-tailed  non-Gaussian  noise.  Vec¬ 
torizing  the  samples  taken  at  the  receivers,  we  have 

x[n]  =  Ap[n] 

where  x[n]  is  the  vector  of  all  the  samples  taken  at  time 
index  n  at  the  receivers,  and 

p[n]  =  s[n]  +  v[n] 

where  s[n]  is  the  vector  of  all  the  original  source  sym¬ 
bols  at  time  index  n,  and  v[n]  is  the  vector  of  complex 
additive  non-Gaussian  noise  at  time  index  n. 


Let  us  denote  V{  [n]  =  a,-  +  jb{  with  i  =  1,2  as  the 
i-th  component  of  vector  v[n],  where  a*  and  6,-  are  the 
real  part  and  the  imaginary  part  of  «j[n]  respectively. 
In  this  example,  a,-  and  are  independent  and  identi¬ 
cally  distributed  with  the  pdf  0.9A7(0, 7/9)  +  0.1A7(0, 3), 
where  <r2)  is  a  Gaussian  pdf  with  mean  y  and  vari¬ 
ance  a2.  Fig.  2  shows  the  noise  pdf’s  around  each  of 
16QAM  symbols. 


Figure  2:  The  non-Gaussian  noise  pdf’s  for  16  QAM 
symbols 


(b)  Our  modified  algorithm  using  normalized  signed  ranks 

Figure  3:  Example  1:  Example  Performance  for 
16QAM  symbols  in  presence  of  heavy-tailed  non- 
Gaussian  noise 

The  channel  characteristic  between  p[n]  and  x[n]  is 
defined  by  the  mixing  matrix  A  with 

/  —0.0783  —  0.0118*  2.3093  +  0.0559*  \ 

A  ~  \  0.8892  +  0.9131*  0.5246  —  1.1071*  J 

Fig.  3  shows  the  results  of  running  the  normalized 
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EASI  algorithm  and  the  modified  algorithm.  The  adap¬ 
tation  step  is  chosen  as  An  =  0.005  for  the  normalized 
EASI  algorithm,  while  A„  =  0.005  and  /<„  =  0.035  for 
the  modified  algorithm  in  (15)  using  normalized  signed 
ranks.  Simulations  for  example  1  have  shown  that  our 
modified  algorithm  gives  generally  good  results.  How¬ 
ever,  the  original  EASI  algorithm  seems  to  blow  up 
in  our  simulations.  We  also  find  that  the  normalized 
EASI  algorithm  exhibits  poor  performance. 

Example  2  In  our  simulation  for  Fig.  5,  two  16QAM 
source  sequences  are  transmitting  through  a  communi¬ 
cation  channel  on  different  subcarrier  frequencies.  There 
exists  cochanel  interference  due  to  the  overlap  of  the 
frequency  bands.  The  channel  model  is  shown  in  Fig. 
4  with  the  overlap  ratio  k/2W  =  13%.  Samples  are 
taken  at  each  receiver  with  symbol  rate  1/T  without 
ISI.  Heavy-tailed  non-Gaussian  noise  is  added  at  the 
receiver  with  SNR=10dB. 

Fig.  5  shows  the  results  of  running  the  original 
EASI  algorithm  and  the  modified  algorithm.  The  adap¬ 
tation  step  is  chosen  as  An  =  0.005  for  the  original 
EASI  algorithm,  while  A„  =  0.005  and  /j.n  =  0.02  for 
the  modified  algorithm  using  normalized  signed  ranks. 
We  can  see  our  modified  algorithm  shows  better  per¬ 
formance  than  the  original  EASI  algorithm. 


Figure  4:  FDMA  Channel  Model  with  Cochannel  Over¬ 
lap 


7.  CONCLUSION 

In  this  paper,  we  have  proposed  a  new  approach  us¬ 
ing  ranks  to  improve  the  robustness  of  the  EASI  algo¬ 
rithms.  Choosing  different  ranking  methods,  our  modi¬ 
fied  algorithm  can  be  applied  to  either  real- valued  data 
or  complex-valued  data.  We  also  give  some  guidelines 
for  designing  the  nonlinear  rank  weighting  functions 
used  in  our  modified  algorithm.  We  have  shown  that 
our  approach  is  more  robust  in  estimating  the  mix¬ 
ing  channel  for  source  separation  as  compared  to  the 
original  algorithm.  More  studies  need  to  be  done  on 
the  general  optimum  rank  weighting  functions  for  both 
real-valued  data  and  complex-valued  data. 


(b)Our  modified  algorithm  using  normalized  signed  ranks 


Figure  5:  Perfomance  for  16QAM  symbols  in  FDMA 
with  cochannel  interference  in  the  presense  of  non- 
Gaussian  noise 
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ABSTRACT 

We  address  the  problem  of  separating  a  linear  convolutive 
mixture  of  2nd  order  white  sources,  given  some  side  in¬ 
formation  about  the  transmitted  messages.  The  proposed 
technique  exploits  the  special  structure  of  the  observed  da¬ 
ta  matrix,  after  channel  whitening:  it  is  the  product  of 
an  orthogonal  and  generalized  Toeplitz  matrices  in  additive 
Gaussian  noise.  We  implement  the  joint  maximum  likeli¬ 
hood  (ML)  estimator  of  both  the  orthogonal  mixing  matrix 
and  the  user  signals,  subject  to  the  known  algebraic  and 
temporal  constraints.  Preliminary  computer  simulations 
assess  the  promising  performance  of  the  proposed  method. 

1.  INTRODUCTION 

Linear  convolutive  mixtures  of  sources  occur  in  many  sce¬ 
narios  of  interest.  For  example,  in  space  division  multiple 
access  (SDMA)  wireless  networks  for  mobile  communica¬ 
tions,  the  signal  observed  at  the  base  station  receiver  is  a 
weighted  linear  superposition  of  the  emitted  user  signals 
plus  echos  [1,  2,  3,  4,  5,  6].  This  is  due  to  the  multipath 
propagation  effect  (intersymbol  inteference  phenomenon), 
and  the  fact  that  several  sources  share  the  same  carrier 
frequency  (co-channel  sources)  for  bandwidth  efficiency  (S- 
DMA  concept).  The  SDMA  receiver  must  resolve  the  ob¬ 
served  mixture,  and  recover  the  transmitted  signals.  In  this 
paper,  we  introduce  a  maximum-likelihood  (ML)  technique 
to  resolve  linear  convolutive  mixtures.  The  proposed  tech¬ 
nique  is  semi-blind  because  certain  fragments  of  the  emi- 
ited  messages  are  assumed  known.  The  side  information 
is  necessary  because  we  do  not  restrict  ourselves  to  finite- 
alphabet  sources.  As  a  consequence,  in  the  absence  of  the 
side  information,  the  factorization  in  the  data  model  would 
not  uniquely  determine  both  the  channel  and  the  user  sig¬ 
nals.  The  paper  is  organized  as  follows.  In  section  2,  we  es¬ 
tablish  the  data  model  and  define  the  problem  formulation. 
In  section  3,  we  develop  the  iterative  maximum-likelihood 
(IML)  algorithm  which  estimates  both  the  orthogonal  mix¬ 
ing  matrix  and  the  user  signals.  Certain  algorithmic  issues 
are  briefly  discussed.  In  section  4,  we  present  computer  sim¬ 
ulation  results  assessing  the  performance  of  the  proposed 
ML  technique.  Section  5  concludes  our  paper. 

Notation.  Matrices  (capital)  and  vectors  are  in  boldface 
type.  Rnxm  is  the  set  of  n  x  m  matrices  with  real  en¬ 
tries.  (-)T,  (0+i  ®i  tr{},  and  vec(-)  stand  for  the  trans¬ 
pose,  Moore  Penrose  pseudo  inverse,  Kronecker  product, 


the  trace,  and  the  vectorization  operator,  respectively.  For 
a  matrix  AeR’l*m,||A||  =  ^Jtt  {AT  A}  denotes  its  Frobe- 
nius  norm.  In  and  0 nxm  represent  the  n  x  n  identity  ma¬ 
trix  and  the  all-zero  nxm  matrix,  respectively  (when  the 
dimensions  are  clear  from  the  context,  the  subscripts  are 
dropped).  For  a  vector  6  —  [6182  ■  ■  ■  81  ]T  €  R( ,  Tnxm  (0) 
denotes  the  nxm  Toeplitz  matrix  generated  by  0,  i.e., 


X nxm  (0)  — 


8n  8n+ 1  8n+ 2  ’  ’  *  8l 

8„-l  0n  '■  ’■  8l- 1 


L  81  62 


01- 


n+1  J 


where  m  +  n  —  1  =  l.  x  ~  Af  (n,C)  means  that  the  random 
vector  *  is  Gaussian  distributed  with  mean  /z  and  covari¬ 
ance  matrix  C. 


2.  PROBLEM  STATEMENT 


Consider  P  co-channel  sources,  observed  through  a  convo¬ 
lutive  finite-impulse  response  (FIR)  multichannel  system: 


p  l-  1 

«(*)  =  £  I>  (!)  8P  (k  -  l)  +  w  (k) ;  (1) 

p=i  1=0 


x  ( k )  G  denotes  the  observations,  hp  ( l )  G  K:V  for  l  = 

0, 1, . . .  ,  L  —  1,  is  the  FIR  of  the  pth  multichannel  filter  (for 
simplicity,  all  FIR  multichannels  have  the  same  length  L), 
dp  ( k )  G  R  is  the  scalar  signal  transmitted  by  the  pth  user, 
and  w  ( k )  G  K'v  represents  additive  noise.  Rewrite  (1)  as 


x{k)  =  [Jfiffz  •••  HP] 


H 


0i  ( k ) 
02  ( k ) 


L  Op  (*)  J 


+w(k),  (2) 


0(k) 


where,  Hp  =  [hp  (0)  hp  (1)  ■  ■  •  hp  (L  —  1)  ]  :  N  x  L,  and 
0P  (fc)  =  [9P  {k)  8P  (k  -  1)  •  •  •  8P  (k  -  L  +  1)]T  :  L  x  1.  H  : 
N  x  LP  is  the  unknown  convolutive  mixing  matrix.  Given  a 
finite  set  of  observations  X  =  {  x  ( k )  :  k  —  1,2,...  ,  K},  we 
aim  at  estimating  the  user  signals  8P  (fc).  The  following  as¬ 
sumptions  are  assumed  to  hold  throughout  the  paper.  (Al) 
P  is  known,  and  H  is  full  column-rank.  (A2)  The  sources 
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are  uncorrelated,  zero-mean  and  white  up  to  2nd  order,  i.e., 
rp,q  ( k ,  l)  =  E  {9P  ( k )  9q  (Z)}  =  S  (p  —  q)  5  (k  —  l),  where  5  (•) 
is  the  Kronecker  delta.  No  statistical  information  about  the 
sources,  beyond  the  2nd  order,  is  assumed  known.  (A3) 
w  ( k )  denotes  zero-mean  spatio-temporal  white  Gaussian 
noise  with  known  power  cr2.  As  it  is  well  known,  even 
for  noiseless  scenarios,  the  factorization  H9(k)  in  the  da¬ 
ta  model  (2),  is  not  unique:  H  can  be  solved  only  up 
to  a  residual  P  x  P  instantaneous  mixing  matrix  [4,  6], 
To  guarantee  uniqueness  of  the  factorization  in  (2),  we 
assume  that  (A4)  for  each  source  p,  a  certain  fragmen- 
t  Tp  ( ip,jp )  =  {9P  (ip) ,  dp  (ip  +  1) , . . .  ,9P  (jp)}  is  known. 
Here,  ip  and  jp  denote  the  beginning  and  end  of  the  known 
excerpt,  respectively.  It  is  not  required  that  ip  =  iq  for 
q  #  p  (nor  jp  =  jq),  i.e.,  we  do  not  assume  synchronization 
among  the  sources  (difficult  to  ensure  in  practice),  but  only 
vis-a-vis  each  source  and  the  base  station  receiver. 

3.  THE  IML  ALGORITHM 

Channel  whitening.  We  work  with  whitened  data  sam¬ 
ples.  Consider  the  eigenvalue-decomposition  (EVD), 

Rx  =  E{x(k)x(k)T}  =  mxf^+a2IN,  (3) 

hht 

where  U  =  [Ux  U2]  €  O  (N),  Ux  :  N  x  LP,  and  £  = 
diag  (£i,0);  here,  and  for  future  reference, 

O(N)  =  |  W  e  KNxN  :  WTW  =  IN  }  , 

denotes  the  group  of  N  x  N  orthogonal  matrices.  It  is 
readily  seen  that 

P  =  1/,e;/2  =  HQt,  (4) 

where  Q  denotes  an  (unknown)  residual  orthogonal  matrix. 
The  whitened  data  samples  are  given  by 

y  (k)  =  P+x  (k)  =  QG  (k)  +n(k),  k  =  l,...,K,  (5) 

where  n(k)  =  P+w(k)  ~  Af  (0,  C),  C  =  cr2!!^1.  The 
principal  advantage  of  the  whitened  data  model  is  that  the 
channel  matrix  Q  :  LP  x  LP  has  less  parameters  to  esti¬ 
mate  than  the  corresponding  channel  matrix  H  :  N  x  LP 
in  (2);  also,  it  is  more  structured  (orthogonal).  For  the 
pth  source,  we  collect  all  the  unknowns  in  (5)  in  0P  = 
[dp  (2  —  L)  ■■■  0P  ( K )  ]7  .  Further,  the  resulting  P  vectors 
are  stacked  in  0  =  [ Of  ■■■  0f.]T.  Also,  we  express  the 
knowledge  of  the  fragment  Tp  (ip,jp)  in  matrix  terms  as 
EpOp  =  V  p,  where  Ep  selects  the  appropriate  entries  of 
Op,  and  yp  contains  the  a  priori  known  values,  i.e.,  Ep  : 
(K  +  L  —  1)  x  (jP  —  ip  +  1)  contains  the  columns  L  —  1  +  ip 
to  L-  1+j'p  of  Ik+l- i,  and  rip  =  [9p  (ip)  ■  ■  ■  9P  (jP)]T.  In 
terms  of  the  overall  vector  of  unknowns  9,  we  have  ET9  = 
T],  where  E  =  diag(£i, . . .  ,EP)  and  rj  =  [»jf  •  •  •  T)p]T . 

IML  Algorithm.  Notice  that,  by  assumption  (A2),  the 
receiver  ignores  the  statistical  description  of  the  sources  be¬ 
yond  the  2nd  order.  Thus,  9  is  viewed,  in  the  sequel,  as  a 
deterministic  vector  of  unknowns.  We  aim  at  finding  the 


joint  ML  estimator  of  (Q,0),  subject  to  the  known  con¬ 
straints,  i.e., 

(Q,0)  =  argmax  l  (y  (1)  ,...,y(K)  \  Q,r\)  , 

Q  €  O(LP),Et0  =  rj 

where  l  (■  |  Q,  tj)  stands  for  the  (conditioned)  likelihood  of 
the  whitened  observations.  After  some  algebra,  the  opti¬ 
mization  task  at  hand  can  be  formulated  as 

(Qmi.Smi)  =  argmin  <j>(Q,9),  (6) 

Q  eO(LP),ET9  =  ti 

where 

<KQ,0)  =  ±\\Y-QT(O)\\2c-1.  (7) 

Y  =  [  y  (1)  ■  •  •  y  ( K )  ]  denotes  the  observed  whitened  data 
matrix,  7”  (9)  is  the  generalized  Toeplitz  matrix  generat¬ 
ed  by  9,  T  (9)  =  \l~  Lixk  (0\)T  ■  •  ■  Tlpxk  (0p)T  j  ,  and 

||Z||^_1  =  tr  {ZTC-1.Z},  for  arbitrary  Z  e  RLxK.  We 
propose  an  iterative  procedure  which  minimizes  cj>(Q,0) 
with  respect  to  Q  and  9,  cyclically.  Table  1  lists  the  result¬ 
ing  iterative  maximum-likelihood  (IML)  algorithm.  Thus, 


Let  Q(0> 

6  O(LP) 

for  n  =  1 

,2,... 

i) 

9(n)  -  arg  minBT0=TI  4 

») 

Q(n)  =  argminQ66t  <fr 

(Q,0(n)) 

until  Q{n 

O 

II 

1 

e 

Or 

1 

Table  1:  IML  Algorithm 

the  IML  algorithm  is  a  cyclic  coordinate  descent  method, 
and  only  locally  convergent. 

Solving  for  0™ .  Write  T (9)  =  £*=1  ZI2-L  BP  (0  9P  ( l ), 
the  matrices  Bp  (■)  :  LP  x  K  being  implicitly  defined;  thus, 
vec  (T(0))  =  BO,  B  —  [6,  (2  -  L)  ■  ■  ■  bP  (K)  ],  bp(l)  = 
vec  (Bp  ( l )),  and 

<t>  ( Q ,  0)  =  II V  ~  (Ik  ®  Q)  BO Hk sc-,  ,  (8) 

with  y  =  vec  (Y).  To  minimize  (8)  with  respect  to  9 
(ET  0  =  7}),  write  9  =  Ey  +  FA;  here,  90  =  Ey  and 
FA  represent  the  known  and  unknown  parts,  respective¬ 
ly.  Notice  that  [  E  F  ]  is  an  identity  matrix  (up  to  per¬ 
mutation  of  columns).  Solving  for  A  in  (8),  gives  A  = 
(FtAF)-1  FTd ,  where  A  =  BT  (lK  ®  QTC1Q )  B,  and 
the  ith  entry  of  the  vector  d,  d,  =  tr  {QtC~1  AyBf}, 
where  Ay  =  Y-QT(90)  ;  the  matrices  Bi  are  defined  by 

\ec(Bi)  =  bi,  B  =  [61  ■  •  •  6p(k-i)+lp  ]  •  (9) 

Solving  for  Q^n\  After  trivial  algebraic  manipulations, 
we  face  the  problem  of  minimizing 

<p{Q)  =ti  {QTC-1QR9}  -2tr{QTC-1Rye},  (10) 
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where  Re  =  j<T (d)T (9)T  and  Rvg  =  ±YT(0)t,  sub¬ 
ject  to  Q  G  O(LP),  the  group  of  LP  x  LP  orthogonal 
matrices.  It  is  well-known  that  O(LP)  C  HLPyLP  is  a  d- 
ifferentiable  manifold  (of  dimension  LP(LP  —  l)/2)  [7].  In 
order  to  exploit  the  curvature  of  this  constraint  surface,  we 
employ  a  geodesic  descent  algorithm  (the  generalization  of 
the  traditional  steepest  gradient  method  in  flat  spaces)  [9]. 
Table  2  describes  the  idealized  geodesic  descent  algorithm. 
In  words,  the  algorithm  proceeds  by  solving  sucessive  one- 

1.  Choose  Q(0)  G  O(LP) 

2.  for  m  =  1, 2, . . . 

a)  Let  D  denote  the  projection  of  —  Vy> 
onto  the  tangent  space  of  O(LP)  at 

b)  Let  Q(t)  G  O(LP),  t  >  0,  denote  the  geodesic 

emanating  from  Q(0)  =  in  the  direction 

Q(  0)  =  D 

c)  Minimize  ip(Q(t))  with  respect  to  t  >  0,  to 

obtain  tm-m  (a  global  minimizer);  set  = 

Q  (tmin) 

3.  until  Q(m)  -  Q(m_1)  =  0 

Table  2:  Geodesic  descent  algorithm 

parameter  geodesic  minimization  problems  (as  line  search 
methods  do  in  flat  spaces  [9]).  We  now  focus  on  the  details 
of  each  of  the  sub-steps  a),  b)  and  c)  of  the  geodesic  descent 
algorithm. 

Sub-step  a).  To  simplify  notation,  and  for  future  refer¬ 
ence,  let  Q0  =  Qirn~l) .  By  applying  standard  calculus 
rules,  it  can  be  seen  that,  the  gradient  (in  matrix  format) 
of  (p(-)  evaluated  at  Q0,  is  given  by 

V^(Q0)  =  2C-1  ( Q0Re-Rve ). 

On  the  other  hand,  the  tangent  space  to  the  manifold  O(LP) 
at  the  point  Q0  G  O(LP)  is  given  by 

Mlp)  (Qo)  =  {QoK  ■  K  G  K(LP)}  , 

see  [7];  here  K,(L)  =  {  K  G  ULPxLP  :  K  =  -KT  }  denotes 
the  linear  subspace  of  skew-symmetric  matrices  in  RLPxLP . 
Thus,  D,  the  orthogonal  projection  of  V  =  —’Vtp  (Q0)  onto 
the  tangent  space  of  0{LP)  at  Q0  is  equal  to  D  =  Q0K o, 
where  K o  can  be  found  as 

K0  =  argmin  |qoJT-v|| 

K  G  IC(LP)  11  11 

=  argmin 

K  G  K{LP)  11  11 

=  |(QoV-VTQ0), 

the  skew-symmetric  component  of  the  matrix  Qp  V. 

Sub-step  b).  We  must  find  a  geodesic  Q{t),  t  >  0,  subject 
to  the  initial  conditions:  Q(0)  =  Q0  and  Q(0)  =  D  = 


Q0Kq.  In  order  to  stay  in  the  constraint  surface  O(LP), 
the  curve  Q(t)  must  satisfy: 

Q(t)  =  Q(t)K(t),  (11) 

where  K (t)  G  K.{LP).  Also,  it  can  be  shown  (see  [9])  that, 
for  constant  speed  geodesics,  the  acceleration  vector  is  or¬ 
thogonal  to  the  manifold.  Thus: 

Q(t)  =  Q(t)S(t),  (12) 

where  S(t)  G  S(LP)  =  {Sellm,>  :  S  =  ST  },  the  lin¬ 
ear  subspace  of  symmetric  matrices  in  RLPxLP  (remark 
that  S(LP)  is  orthogonal  to  K.(LP)).  Using  the  represen¬ 
tation  (11)  in  (12),  gives 

Q(t)K(tf  +  Q(t)K(t )  =  Q(t)S(t). 

Eliminating  Q(t)  and  re-arranging  terms,  we  have 

K(t)  =  5(f)  -  K(t)2, 

i.e.,  k(t)  G  S(LP)  (notice  that  K(t)2  G  S(LP)).  On  the 
other  hand,  K(t)  G  IC(LP),  since  K(t)  G  K.(LP).  Thus, 
K(t)  =  0,  and  K(t)  =  K( 0)  is  a  constant  matrix;  in  fact, 
K(t)  =  K0,  by  the  restriction  on  Q(0).  Now,  by  (11)  and 
the  condition  on  Q(0),  we  find  Q{t)  =  Q0eKot. 

Sub-step  c).  We  have  to  minimize  7(f)  =  < p(Q(t))  over 
t  >  0.  Here,  instead  of  performing  an  exact  minimization, 
we  propose  to  locate  the  first  (perhaps  only  local)  minimizer 
of  7(f).  This  inaccurate  (sub-optimal)  scheme  (often  used  in 
practice)  permits  to  aliviate  the  computational  burden  and 
does  not  impair  convergence  of  the  overall  algorithm  (the 
outer  minimization  loop),  if  a  sufficient  degree  of  descent 
in  7(f)  is  achieved.  We  propose  to  locate  (the  first)  point 
to  >  0  such  that  7(fo)  =  0,  by  the  bisection  method.  Notice 
that  7(o(0))  <  0,  a(0)  =  0.  Let  6  >  0  be  given,  and  l  the 
lowest  integer  such  that  7 (IS)  >  0.  Set  l/0'1  —  16.  We  have 
the  following  procedure. 

1.  for  m  =  1,2, . . . 

Letc=  |(a(m-1)+6(m-1)) 
if  7(c)  =  0  stop 
if  7(c)  <  0  then 

a(m)  =  c,  b(m)  =  b(m-1} 

else 

o(m>  =a(m_1),  b(m)  —  c 

2.  until  |6(m)  -a(m)|  <e 

3.  Set  to  =  i(o(m)+6(m)) 

Using  standard  calculus  rules  and  the  fact  that  Kg  com¬ 
mutes  with  eKot,  we  have 

7(f)  =  tr  {e-KotC'0eKof5o}  +  2tr  {e~KotC0Qo  RyeK0}  , 

where  Co  =  QPC~1Q0,  and  5o  =  KgRg  —  RgKo ■  The 
expression  for  7(f)  can  also  be  easily  found,  and  used  to 
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check  if  the  obtained  to  is,  in  fact,  a  local  minimizer  (if  not, 
the  algorithm  must  be  restarted  with  a  smaller  S,  say,  6/2). 

Initialization  The  geodesic  descent  algorithm  is  only  lo¬ 
cally  convergent.  A  (possible  first)  initialization  is  given 
by 

Q(0)  =  n0(tP)  jcr1^} .  (13) 

Here,  IIo(lp)  (Z)  is  the  (nonlinear)  projection  of  the  ma¬ 
trix  Z  £  'RLPxLP  onto  the  orthogonal  group  O(LP).  It 
is  computed  as  follows:  let  Z  =  UT,Vt  denote  a  singular- 
value  decomposition  (SVD)  of  Z;  then,  II o(lp)  (Z)  =  UVT . 
This  remark  is  based  on  the  fact  that,  near  the  global  min¬ 
imum, 

1  K 

Re  =  -Y/0{k)9(kf  ~Re  =  E{e(k)0(kf}  =  ILP. 

A  k= l 

Thus,  the  first  term  in  (10)  reduces  to  a  constant,  and 
ip  (Q)  —  —2  tr  | QTC~1RveJ,  which  is  minimized  by  (13)  [8]. 

Binary  sources.  For  noiseless  data  samples  and  binary 
sources,  i.e.,  6p(k)  =  ±1,  the  factorization  implicit  in  (2) 
is  essentially  unique:  the  sources  are  determined  up  to  a 
permutation  and  a  sign  ambiguity;  see  [1,  4].  Thus,  prior 
knowledge  of  data  fragments  can  be  discarded.  To  take  into 
account  the  finite-alphabet  property  of  the  sources,  we  can 
introduce  in  the  IML  algorithm  the  extra  step  i)a):  0(n)  4- 

sign  i.e.,  the  entries  of  are  projected  onto  the 

binary  alphabet  A  =  {±1}.  This  modification  makes  the 
IML  algorithm  more  sensible  to  initialization,  as  monotone 
convergence  of  4>  can  no  longer  be  assured.  Notice  that  the 
(optimal)  approach  of  performing  step  i)  with  the  entries 
of  0  restricted  to  A  is,  from  the  computational  viewpoint, 
considerably  complex. 

4.  EXPERIMENTAL  RESULTS 

To  evaluate  the  performance  of  the  IML  algorithm,  we  con¬ 
ducted  some  computer  simulations.  We  considered  P  =  2 
binary  users,  and  a  channel  matrix  H  :  N  x  LP,  where 
N  —  10  and  L  =  3.  The  entries  of  H  were  randomly 
generated  (independent  samples  of  a  zero-mean  Gaussian 
random  variable  with  unit  variance).  Since  the  sources  are 
binary,  no  symbol  is  assumed  known  a  priori ,  and  the  IML 
algorithm  is  runned  with  the  step  i)a)  (adaptation  for  BP- 
SK  sources)  discussed  in  section  3.  The  signal-to-noise  ratio 
(SNR)  is  defined  as  SNR  =  \\H\\2  /(No2).  The  SNR  was 
varied  between  SNRmin  =  —  5  dB  and  SNRmax  =  5  dB,  in 
steps  of  2.5  dB.  For  each  SNR,  100  statistically  independent 
trials  were  considered.  For  each  trial,  K  =  100  samples  were 
generated,  the  IML  algorithm  was  runned,  and  the  bit  error 
rates  (BER)  were  evaluated  by  counting  the  errors;  also,  the 

square-error  of  the  channel  estimate,  ||n  —  If  |  was  com¬ 
puted;  here,  H  =  PQ ML,  recall  (4).  In  figure  1,  we  plot 
the  mean  square  error  of  the  channel  estimate  H  thus  ob¬ 
tained:  solid  line  with  squares.  For  comparison,  the  dashed 
line  with  circles  denotes  the  Cramer-Rao  Bound  (CRB)  for 


H,  assuming  that  all  the  transmitted  symbols  are  known. 
The  curves  are  quite  close.  Figures  2  and  3  (solid  lines  with 


Figure  1:  Mean-Square  Error:  Proposed  (solid, square)  and 
Cramer-Rao  Bound  with  0  known  (dashed, circle) 

squares)  display  the  mean  BERs  obtained  for  user  1  and 
user  2,  respectively.  For  comparison,  the  dashed  lines  with 
circles  refer  to  the  BERs  obtained  by  linear  equalizers  based 
on  the  true  channel  matrix  H,  (rows  of  the  pseudo-inverse 
of  H),  followed  by  simple  recombination  (mean  value)  of 
the  echos  prior  to  the  sheer. 


Figure  2:  BER  1:  Proposed  (solid, square),  linear  equalizer 
with  H  known  (dashed, circle) 


5.  CONCLUSIONS 

We  proposed  a  semi-blind  method  for  separating  linear  con- 
volutive  mixtures  of  sources  when  their  statistical  descrip¬ 
tion  is  known  only  up  to  the  2nd  order.  We  developed 
an  iterative  technique  (IML  algorithm)  which  permits  to 
compute  the  joint  ML  estimator  for  the  orthogonal  mixing 
matrix  Q  and  user  signals  0,  in  the  whitened  data  space. 
The  IML  algorithm  respects  both  the  algebraic  and  tempo¬ 
ral  constraints  on  the  pair  of  unknowns  ( Q ,  0).  Future  work 
includes  a  detailed  study  on  the  convergence  properties  of 
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Figure  3:  BER  2:  Subspace  (solid, square),  linear  equalizer 
with  H  known  (dashed, circle) 


the  IML  algorithm  (and  its  optimization  subproblems),  and 
evaluation  of  the  computational  complexity  of  both  2  steps 
(per  iteration). 


REFERENCES 

[1]  S.  Talwar,  M.Viberg,  and  A.  Paulraj,  “Blind  Estima¬ 
tion  of  Multiple  Co-Channel  Digital  Signals  Using  an 
Antenna  Array,”  IEEE  Signal  Processing  Letters,  vol. 
1,  no.  2,  pp.  29-31,  February  1994 

[2]  V.  Barroso,  J.  M.  F.  Moura,  and  J.  Xavier,  “Blind 
Array  Channel  Division  Multiple  Access  (AChDMA) 
for  Mobile  Communications,”  IEEE  Transactions  on 
Signal  Processing,  vol.  46,  pp.  737-752,  March  1998 

[3]  B.  Haider,  B.  Ng,  A.  Paulraj,  and  T.  Kailath,  “Un¬ 
conditional  Maximum  Likelihood  Approach  for  Blind 
Estimation  of  Digital  Signals,”  in  International  Con¬ 
ference  on  Acoustics,  Speech  and  Signal  Processing  (I- 
CASSP’96),  vol.  2,  pp.  1081-1084,  1996 

[4]  A.  Van  der  Veen,  S.  Talwar,  and  A.  Paulraj,  “A  Sub¬ 
space  Approach  to  Blind  Space-Time  Signal  Processing 
for  Wireless  Communication  Systems,”  IEEE  Transac¬ 
tions  on  Signal  Processing,  vol.  45,  no.  1,  pp.  173-190, 
January  1997 

[5]  J.  Xavier  and  V.  Barroso,  “Blind  source  separation,  ISI 
cancellation  and  carrier  phase  recovery  in  SDMA  sys¬ 
tems  for  mobile  communications,”  Wireless  Personal 
Communications,  vol.  10,  pp.  35-76,  Kluwer  Academ¬ 
ic  Publishers,  June  1999 

[6]  A.  Ghorokov  and  P.  Loubaton,  “Subspace  based  tech¬ 
niques  for  blind  separation  of  convolutive  mixtures 
with  temporally  correlated  mixtures,”  IEEE  Transac¬ 
tions  on  Circuits  and  Systems  -  I:  Fundamental  Theory 
and  Applications,  vol.  44,  pp.  813-820,  no.  9,  Septem¬ 
ber  1997 

[7]  R.  Bhatia,  Matrix  Analysis,  Springer- Verlag  New- 
York,  Inc. 


333 


TECHNIQUES  FOR  BLIND  SOURCE  SEPARATION  USING  HIGHER-ORDER 

STATISTICS 


Ziauddin  M.  Kamran,  A.  Rahim  Leyman 

School  of  Electrical  and  Electronic  Engineering 
Nanyang  Technological  University 
Nanyang  Avenue,  Singapore  639798 
{pm0538251 , earleyman}@ntu . edu . sg 

ABSTRACT 

The  blind  source  separation  (BSS)  problem  consists 
of  the  recovery  of  a  set  of  statistically  independent 
source  signals  from  a  set  of  measurements  that  are 
mixtures  of  the  sources  when  nothing  is  known  about 
the  sources  and  the  mixture  structure.  This  paper 
considers  the  separation  and  estimation  of  indepen¬ 
dent  sources  from  their  instantaneous  linear  mixed  ob¬ 
served  data.  The  concept  of  higher-order  moment  and 
higher-order  time-frequency  distribution  matrices  are 
also  introduced.  In  practice,  separation  can  be  achieved 
by  using  suitable  second-order  statistics  (SOS)  and/or 
higher-order  statistics  (HOS).  Computationally  feasi¬ 
ble  implementations  are  presented  based  on  joint  di- 
agonalization  of  the  moment  matrices  and  matrices  of 
the  principal  slices  of  the  time-multifrequency  domain 
of  support  of  the  moment-based  Wigner  trispectrums. 

The  latter  approach  allows  separation  of  the  sources 
with  nonstationarity  properties.  Simulation  results  are 
given  to  demonstrate  the  effectiveness  of  the  proposed 
approaches. 

1.  INTRODUCTION 

In  BSS  the  problem  is  how  to  recover  independent 
sources  given  the  sensor  outputs  in  which  the  sources 
have  been  mixed  in  an  unknown  channel.  A  number 
of  applications  require  the  extraction  of  a  set  of  signals 
which  are  not  directly  accessible.  Instead,  this  extrac¬ 
tion  must  be  carried  out  from  another  set  of  measure¬ 
ments  which  were  generated  as  mixtures  of  the  initial 
set.  Since  usually  neither  the  original  signals  -  called 
sources  -  nor  the  mixing  transformation  are  known, 
this  is  certainly  a  challenging  problem  of  multichan¬ 
nel  blind  estimation.  This  problem  is  encountered  in 
a  wide  range  of  application  fields,  such  as  array  pro¬ 
cessing,  communications,  biomedical  signal  processing, 
image  processing,  and  speech  processing.  While  ill- 
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defined  in  some  situations,  BSS  becomes  a  well  de¬ 
fined  problem  in  the  context  of  multiple-sensor  sig¬ 
nal  processing.  Thus  far,  numerous  approaches  have 
been  proposed  and  implemented  to  this  problem  by  us¬ 
ing  HOS  based  approaches  [1],  [2],  and  SOS  based  ap¬ 
proaches  [3],  [4].  Firstly,  we  introduce  a  BSS  technique 
based  on  a.  joint  diagonalization  of  several  fourth-order 
moment  matrices.  In  this  case,  we  consider  stationary 
sources.  Although  most  of  the  approaches  are  success¬ 
ful  under  certain  assumed  conditions,  one  common  lim¬ 
itation  involving  with  them  is  that  they  are  applicable 
only  for  stationary  sources.  In  practical  applications, 
nonstationary  processes  are  frequently  encountered  in 
radar,  sonar,  and  communication  systems.  In  contrast 
to  BSS  approaches  using  these  techniques,  we  also  pro¬ 
pose  another  approach  to  take  advantage  explictly  of 
the  nonstationary  property  of  the  signals  to  be  sepa¬ 
rated.  This  is  accomplished  by  resorting  to  the  pow¬ 
erful  tool  of  time-frequency  ( t  —  f)  signal  representa¬ 
tions.  This  approach  for  BSS  exploits  the  fourth-order 
moment  spectra  based  t  —  f  distributions  of  the  array 
output.  Recently,  t  —  f  distributions  have  been  ap¬ 
plied  for  BSS  problems  [4],  Simulation  examples  are 
presented  to  demonstrate  the  effectiveness  of  the  pro¬ 
posed  approaches. 

2.  PRELIMINARIES:  PROBLEM 
STATEMENT  AND  TERMINOLOGY 

2.1.  Problem  Description 

The  goal  of  BSS  can  be  briefly  stated  as  recovering  a  set 
of  n  zero-mean  statistically  independent  source  signals 
s (t)  =  [s^(t), . . . ,  s„(t)]T  from  a  set  of  m  instantaneous 
linear  mixtures  x(f)  =  \xn(t), . . . ,  xm(t)]T,  which  are 
the  observed  signals  or  sensor  output.  In  matrix  form, 
this  problem  leads  to  the  following  data  model 

x(t)  =  As(f)  +  n(t)  (1) 
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where  n (f)  =  [n^t),. . .  ,nm{t)]T  is  a  m  X  1  additive 
noise  vector  whose  elements  are  modeled  as  station¬ 
ary,  spatially  and  temporally  white,  zero-mean  com¬ 
plex  random  processes,  and  independent  of  the  source 
signals.  That  is,  E[n(t  +  r)n*  (t)]  =  (tS(t)I,  where  S(r) 
is  the  Kronecker  delta,  I  denotes  the  identity  matrix,  a 
is  the  noise  power  at  each  sensor,  superscript  *  denotes 
conjugate  transpose  of  a  vector,  and  E[-\  is  the  statis¬ 
tical  expectation  operator.  The  unknown  m  x  n  com¬ 
plex  matrix  A  is  the  full  column  rank  mixing  matrix 
or  parameter  matrix  that  characterizes  the  medium  or 
channel.  The  power  of  the  sources  is,  in  principle,  arbi¬ 
trary  since  a  scalar  factor  can  be  swapped  between  any 
source  and  its  associated  column  in  the  mixing  matrix 
without  altering  the  measurements.  These  well-known 
facts  constitute  a  basic  indeterminacy  in  BSS  [2].  At 
best,  mixing  matrix  A  can  be  identified  up  to  a  permu¬ 
tation  and  scaling  of  its  columns.  Therefore,  nothing 
prevents  us  from  further  assuming  that  the  sources  are 
unit  power  signals,  F[|s;(f)|2]  =  1  for  1  <  i  <  n,  so 
that  the  dynamic  range  of  the  sources  is  accounted  for 
by  the  magnitude  of  the  corresponding  column  of  A. 
It  should  be  noted,  however,  that  this  is  merely  a  con¬ 
vention. 

2.2.  Higher-Order  Moments 

Let  xi (t), . . .  ,xm(t)  be  m  random  processes  and  its 
fourth-order  moment  sequence  can  be  defined  as 

Moment  [xi(t),x*(t  +  T\),Xk(t  +  T2),x*(t  +  T3)] 

=  MOMXiX*XkX;  (n,  r2,  r3) 

=  E[Xi(t)Xj(t  +  Ti  )xk(t  +  T2)x;*(f  +  7-3)]  (2) 

where  1  <  i,j,k,l  <  m.  For  finite  fourth-order  mo¬ 
ments,  we  also  define  a  moment  set  denoted  by 

Qx(7V,r2,r3)  =  {MOMXiX*XkXf(Ti,T2,T3)}  (3) 

where  1  <  i,j,k,l  <  m.  We  assume  that  there  exists 
consistent  estimate  of  Qx(ti ,72,73).  We  also  assume 
the  source  signal  vector  s(f)  is  a  stationary  random 
multivariate  process  with 

E[s(t  +  7-)s*(f)]  =  diag[pn(7-), . . . ,  pnn{r)\  (4) 

where  diag[-]  is  the  diagonal  matrix  formed  with  the 
elements  of  its  vector  valued  argument,  and  Pu(t)  = 
E[si(t  +  r).s*(f)]  denotes  the  autocovariance  of  s,(t). 
Since  sources  are  uncorrelated,  we  define  source  auto¬ 
moments  as 

"p(7i,7-2,T3)  =  MOMsps*SpS*(ri,r2,r3)  (5) 

where  1  <  p  <  n.  For  notational  convenience,  we  de¬ 
note  ,  r2,  t3)  as  vp.  We  consider  that  auto-moments 
of  sources  up  exist  V  iq,  72,73. 


2.3.  Higher-Order  Moment  and  Spectra  Based 
Time-Frequency  Distributions 

Time-frequency  distributions  have  proven  useful  for  an¬ 
alyzing  a  variety  of  signals  and  systems.  In  particular, 
if  the  frequency  content  is  time  varying  as  in  nonsta¬ 
tionary  signals,  then  this  approach  is  quite  attractive. 
An  infinite  number  of  t  —  f  distributions  of  a  signal 
x(t),  can  be  generated  from  a  unified  framework  using 
Cohen’s  general  class  formulation  [5], 

Px(t,f)  =  j  J  J  <&{Q.,t)x*(u-t/2)x{u  +  t/2) 

T  Q  u 

x  exp  (j27rafi)  exp  (— j2irtQ) 
x  exp  (— j27r/r)drdfldu  (6) 

where  t  and  /  represent  the  time  index  and  the  fre¬ 
quency  index,  respectively.  The  kernel  $(fi,  r)  charac¬ 
terizes  the  distribution  and  is  a  function  of  both  time 
and  lag  variables.  For  the  usual  case  of  signal  inde¬ 
pendent  kernel  $(0,  r)  all  members  of  Cohen’s  general 
class  are  bilinear  with  respect  to  signal.  This  bilin¬ 
earity  is  necessary  to  obtain  a  second-order  spectral 
analysis  of  the  signal.  However,  bilinear  representa¬ 
tion  can  not  give  information  about  the  temporal  evo¬ 
lution  of  the  higher-  (than  second)  order  spectrum  of 
the  signal.  New  representations  that  could  illustrate 
the  time-varying  higher-order  spectral  information  of 
the  signal  under  analysis  need  to  be  defined.  They 
should  incorporate  a  higher-order  nonlinearity  in  their 
formulation.  The  unified  approach  given  by  Cohen  to 
most  t  —  f  representations  can  be  generalized  to  the 
case  of  higher-order  moment  spectra.  A  general  class 
of  higher-order  moment  and  spectra  based  t  —  f  distri¬ 
butions  is  proposed  in  [6]  as  extension  of  Cohen’s  gen¬ 
eral  class  of  distributions  to  the  higher-order  spectral 
domains.  Let  xi(t),. . . ,  xm(t)  be  m  random  processes. 
Define  moment-based  fourth-order  Wigner  distribution 
or  Wigner  trispectrum  (WT)  by 

W^XiXjXkX*  f ) 

MomXiX*XkXf  ( t ,  t)  exp  (-j27r/Tr)dr  (7) 

r 

where  1  <  i,j,k,l  <  m;  t  and  /  =  [/i,/2,/3]t  repre¬ 
sent  time  index  and  multifrequency  index  respectively; 
r  =  fa, 7-2, 73 ]T  and  dr  =  dridr2dr3.  Here,  we  define 
MomXiX*XkX»(t,T )  as  follows: 

MomXiX*XkX*  (t,  t) 

1  T 

=  lim  —  Xi  ( t  +  a)x*:  ( t  +  b)xk  ( t  +  c)x*  ( t  +  d)  (8) 

T— >00  1  J 
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where  a,  b,  c,  and  d  are  obtained  from  the  following  re¬ 
quirements: 

(1)  b  —  a  =  ti,  c  —  a  =  r2,  and  d  —  a  =  73  and 

(2)  the  “lag  centering”  condition,  a  +  6  +  c  +  <i  =  0. 

These  two  requirements  lead  to  a  =  —  (ri  +  r2  +  T3)/4, 
b  =  (3ti  -  r2  -  r3)/4,  c  =  (3r2  -  rj  -  r3)/4,  and  d  = 
(3 r3  —  n  —  r2)/4.  Then  the  moment-based  fourth-order 
time-frequency  distribution  is  formulated  as 

PxiXjXkX *  (t,  f  ) 

$(fi,  T)MomXiX*XkX *  (u,  t)  exp  (j27rrif2) 

T  Vt  u 

x  exp  (— j2ntSl)  exp  (j27r/Tr)drdOdu  (9) 


resolution  and  obscure  the  true  signal  features.  To  re¬ 
duce  the  cross-terms  of  the  principal  slice  of  the  WT 
effectively,  we  apply  Choi-Williams  (CW)  exponential 
kernel  to  the  SWT  and  hence,  we  define  the  sliced  re¬ 
duced  interference  trispectrum  distribution  (SRTD)  as 

SRIDXiX*XkX*(t,f) 

-  J  j  exP  {T2n2/Q)\^j  J  SWXiX.XkX.(t',f) 

Or  t'  f 

xexp  exp  (—  j2irf,T)dt'df' 

x  exp  (j27rt.fi)  exp  (j27r/r)dfidr  (11) 

where  g  is  the  kernel  width.  We  define  a  SR1D  set 
denoted  Bx(t,f)  as 


where  the  three  dimensional  kernel  4>(fl,  r)  character¬ 
izes  the  distribution.  The  numerical  implementation  of 
the  WT  requires  computation  of  the  three-dimensional 
FFT.  Computation  of  the  WT  at  least  at  the  signal  rate 
for  high  temporal  resolution  and  for  a  frequency  reso¬ 
lution  of  jf  demands  0(N4  log2  N)  operations  in  WT, 
where  N  is  the  length  of  data  window.  As  proposed  by 
Fonollosa  et  al.  [7],  a  computationally  feasible  imple¬ 
mentation  of  the  WT  can  be  obtained  by  considering 
two-dimensional  slices  of  the  time  multi-frequency  do¬ 
main  of  support  of  the  WT.  Each  slice  corresponds  to 
the  plane  defined  by  the  temporal  axis  and  one  fre¬ 
quency  line  that  represents  jointly  all  three  frequency 
axes  of  the  four  dimensional  WT.  Care  must  be  taken  to 
choose  this  frequency  line  appropriately  to  contain  the 
information  of  interest  for  every  particular  application. 
Among  all  possible  slices,  we  choose  the  principal  slice 
corresponding  to  the  temporal  axis  and  principal  diag¬ 
onal  of  the  higher-order  spectra.  The  principal  slice  is 
thus  the  plane  corresponding  to  /„  =  /2  =  —  /3  in  the 
time-multifrequency  domain  of  the  WT.  Our  approach 
to  blind  identification  exploits  the  principal  sliced  ver¬ 
sions  of  the  WTs  defined  as  the  sliced  Wigner  trispec- 
trums  (SWTs)  of  the  array  output.  From  (7)  we  define 
the  SWT  as 


R^^XiXjXkX*  f)  ~  ^XiXjXkX*  {t )  f) 

=  J  J  J  MomXiX*XkX*(t,r) 


h=h=~h=f 


T\  T2  T3 

x  exp  (— j27t/(ti  +  r2  -  r3))dridr2dr3.  (10) 


The  bilinear  dependence  on  the  signal  of  Cohen’s  class 
is  substituted  by  a  multilinear  form  in  the  WT.  Con¬ 
sequently,  cross-terms  are  more  numerous  in  WT  than 
they  are  in  the  bilinear  representations,  which,  if  al¬ 
lowed  to  pass  into  WT,  can  reduce  auto-component 


Sx(t,/)  =  {SRIDXiX>x;(t,f)}  (12) 

where  1  <  i,j,  fc,  l  <  m.  We  assume  that  there  exists 
consistent  estimate  of  In  this  case,  we  also 

assume  that  the  source  signal  vector  s (t)  is  a  nonsta¬ 
tionary  random  multivariate  process  with 

lim  4  £  s(t  +  T)s*(t)  =  diag[711(r),...,7„„(r)] 

1  — >00  1 

<=1,T 

(13) 

where  the  autocovariance  of  Si(t)  is  denoted  by  7 « (r)  = 
limT-*oo7  £t=i  t  Si(t  +  r)s*(t).  Since  sources  are  in¬ 
dependent,  the  source  cross-cumulants  or  cross-terms 
of  the  source  SRJD  vanish.  Hence,  the  source  auto¬ 
terms  are  denoted  as 

Kp{t,f)  =  SMDsps*psps*p(t,f)  (14) 

where  1  <  p  <  n.  For  notational  convenience,  we  de¬ 
note  Kp(t,f)  as  kp.  We  also  assume  that  SRIDs  of 
sources  kp  exist  V  f,  /. 

3.  HIGHER-ORDER  STATISTICS  BASED 
BSS  APPROACHES 

Let  W  denotes  a  n  x  m  whitening  matrix  such  that 
WA  =  U,  where  U  is  a  n  x  n  unitary  matrix.  The 
whitened  process  still  obeys  a  linear  model 

z(t)  =  Wx(f)  =  W[As(t)  +  n(t)]  =  Us(t)  +  Wn(t), 

(15) 

We  form  moment  matrices  (77,  t2,  t3)  by  the  inner 
product  of  the  moments  of  the  whitened  data  with  an 
arbitrary  n  x  n  matrix  P,  i.e., 

n  n 

[Mr  (Ti,7a,7s)]y  =  ££  MOMZiZ*j2kZ;  (n,  r2,  T3)Pik 

k- 1  1=1 

(16) 
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where  1  <  i,  j  <  n,  and  the  (l,  k)ih  component  of  the 
matrix  P  is  written  as  Pik.  Since  sources  are  uncor¬ 
related  and  the  fourth-order  moments  of  different  el¬ 
ements  of  the  independent  and  identically  distributed 
(i.i.d.)  noise  vector,  M0Mni„*„fc„*(Ti,T2,T3)  =  0  [8], 
where  1  <  i,j,k,l  <  m,  and  rx  ^  t2  ^  r3,  the  rela¬ 
tion  (15)  yields 

n 

MOMZiZ-ZkZ;  (n,  r2,  r3)  =  ^  vpuivu*pukpu*lp.  (17) 

P=  1 

where  rt,p  denotes  the  (i,p)th  entry  of  the  matrix  U. 
Therefore,  the  whitened  moment  matrix  can  be  rewrit¬ 
ten  as 

n 

Mp(ti,t2,t3)  =  YsVpupPupupup  VP 

P=1 

^Mp(n,r2,T3)  =  UA’pUh  (18) 

where  Ap  =  diag  [i/iuJPui,  •  •  • ,  i/n<Pu„]  is  a  diag¬ 
onal  matrix  whose  diagonal  elements  depend  on  the 
particular  matrix  P  as  well  as  the  whitened  data  vec¬ 
tor  z (t),  superscript  H  denotes  the  complex  conjugate 
transpose  of  a  matrix,  and  up  denotes  pth  column  of 
the  matrix  U. 

Similarly  we  can  form  the  whitened  SR1D  matrices 
as  follows 

s  ?(*,/)  =  ;f>pU;pupUpU;  vp 

p-i 

=>S?(t,f)  =  UApUH  (19) 

where  Ap  =  diag  [kiu^Pui,  •  •  • ,  /«n<Pun]  is  a  diag¬ 
onal  matrix  whose  diagonal  elements  depend  on  the 
particular  matrix  P  as  well  as  the  whitened  data  vector 
z (t).  From  (18)  and  (19)  stem  the  basic  idea  for  eigen- 
based  blind  identification  using  moment  matrices  and 
SRID  matrices  of  the  observed  data,  respectively.  Thus 
matrix  U  diagonalizes  Mzp(ti,  t2,  t3)  or  Szp(f,  /)  and 
the  columns  of  the  unknown  U  can  be  identified  to 
the  eigen- vectors  of  Mzp(ri,  t2,  t3)  or  Szp(f,  /)  for  any 
matrix  P.  A  similar  diagonalization  of  fourth-order  cu- 
mulant  matrices  is  explored  in  [1]. 

We  form  the  sample  fourth-order  moments  Qz(£q) 
of  the  whitened  process  z (t)  =  Wx(f)  for  a  fixed  set  of 
time  lags  Cq  =  {Tq,Tq+\,Tq+2  |  1  <  q  <  K  and  rq  ^ 
Tq+ 1  ^  t9+2).  To  reduce  the  possibility  of  having  de¬ 
generate  eigenvalues  as  well  as  to  reduce  the  effect  of 
the  noise,  we  compute  the  n  most  significant  eigen- 
pairs  {A(,  P(  |  1  <  r  <  n}  from  the  eigen-structure  of 
Qz(£q)  derived  from  (18)  corresponding  to  each  time 
lag  combination  {rg,Tg+i,rg+2}  in  £q.  Similarly,  we 
form  the  sample  SRIDs  Bz{tq,  fq)  of  the  whitened  pro¬ 
cess  for  a  fixed  set  of  ( tq,fq )  points,  1  <  q  <  K.  The 


n  most  significant  eigenpairs  {A",P"  |  1  <  r  <  n]  are 
computed  from  the  eigen-structure  of  Bz(tq,  fq)  derived 
from  (19)  corresponding  to  each  (t,f)  point  of  a  fixed 
set  of  (tq,  fq)  points  which  correspond  to  the  signal  au¬ 
toterms.  The  whitening  matrix  W  can  be  obtained 
from  SOS  [1],  [3],  [4].  A  unitary  matrix  U  is  then 
obtained  as  joint  diagonalizer  of  the  K  x  n  matrices 

obtained  from  the  set  M!  =  {A(P(  |  1  <  r  <  n}  corre¬ 
sponding  to  each  {rg,rg+i,Tg+2}  in  Cq,  or  from  the  set 

Mn  =  {A"P"  |  1  <  r  <  n}  corresponding  to  each  (t,  f) 
point  of  a  fixed  set  of  (tq,fq)  points  which  correspond 
to  the  signal  autoterms.  An  efficient  joint  approxi¬ 
mate  diagonalization  technique  exists  in  [1],  [3],  [9], 
The  source  signals  are  estimated  as  s (t)  —  U"Wx(i), 
and/or  the  mixing  matrix  is  estimated  as  A  =  W#U, 
where  the  superscript  #  denotes  the  Moore-Penrose 
pseudoinverse. 

4.  SIMULATION  RESULTS 

Computer  simulations  are  conducted  to  illustrate  the 
performance  of  the  proposed  approaches.  In  the  sim¬ 
ulations  environment,  a  three-element  uniform  linear 
array  with  half  wavelength  sensor  spacing  receives  two 
signals  in  the  presence  of  white  Gaussian  noise.  The 
first  chirp  signal  contains  a  quadratic  phase  coupling, 
whereas  the  second  signal  is  logarithmic  chirp  signal. 
The  sources  arrive  from  different  directions  (j> j  =  10° 
and  fa  =  50°.  While  choosing  tq  -  fq  points  of  the  sets 
of  SRIDs  Bz(tq,fq),  we  take  K  time-frequency  high 
signal  to  noise  ratio  (SNR.)  auto-term  points.  The  per¬ 
formance  is  evaluated  by  using  mean  rejection  level  = 
£Mo£|(A#AU|2  [4],  where  £j(A#AU|2  measures 
the  ratio  of  the  power  of  the  interference  of  the  6th 
source  to  the  power  of  the  nth  estimated  source  signal. 
Fig.  1  shows  the  mean  rejection  levels  of  the  proposed 
approaches  over  the  range  from  -10  dB  to  20  dB  of 
SNR.  In  this  case,  10  matrices  are  considered  for  joint 
diagonalization.  In  Fig.  2,  the  mean  rejection  levels 
are  plotted  as  the  function  of  the  number  of  the  jointly 
diagonalized  matrices  for  SNR  =  0  dB.  The  mean  re¬ 
jection  levels  are  evaluated  over  100  Monte-Carlo  trials 
with  512  snapshots.  It  is  apparent  that  the  proposed 
approaches  have  clearly  succeeded  to  estimate  the  un¬ 
known  mixing  matrix.  The  approach  using  SRID  ma¬ 
trices  shows  better  performance  than  that  of  using  mo¬ 
ment  matrices.  The  excellence  of  using  t  -  f  represen¬ 
tations  for  the  BSS  problem  is  proven  again. 

5.  CONCLUSION 

In  this  paper,  we  propose  BSS  approaches  using  fourth- 
order  moments  and  its  time-frequency  representations. 
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Sample  size  =  512 


Figure  1.  Mean  rejection  level  versus  SNR. 


These  are  based  on  the  joint  diagonalization  of  the  mo¬ 
ment  matrices  and  SRID  matrices  of  the  principal  slices 
of  time-multifrequency  domain  of  support  of  the  WTs, 
which  allows  the  whole  moment  and  trispectrum  set  to 
be  processed  with  a  computational  efficiency  similar  to 
eigen-based  techniques.  Moreover,  computation  of  two- 
dimensional  slices  of  trispectrums  demands  the  same 
computational  complexity  and  time-frequency  resolu¬ 
tion  as  bilinear  time-frequency  representations.  The 
important  distortion  introduced  by  cross-terms  of  the 
principal  slice  of  the  WT  is  reduced  effectively  by  the 
application  of  the  exponential  kernel  of  the  CW  dis¬ 
tribution,  Hence  overwhelming  complexity  due  to  a 
product  in  the  fourth-order  ambiguity  domain  with  the 
fourth-dimensional  extension  of  the  CW  kernel  can  be 
avoided.  Numerical  experiments  are  presented  to  assess 
the  effectiveness  of  the  proposed  approaches.  The  ap¬ 
proach  using  SR.ID  matrices  shows  better  performance 
than  that  of  using  moment  matrices.  This  is  because 
of  the  fact  that  the  effect  of  spreading  the  noise  power 
while  localizing  the  source  energy  in  the  t  —  f  domain 
amounts  to  increasing  the  robustness  of  the  choosing 
SRID  matrices  with  respect  to  noise. 
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ABSTRACT 

In  this  paper,  we  extend  the  results  leading  to  the 
popular  JADE  and  STOTD  algorithms  to  cumulants 
of  any  order  greater  than  or  equal  to  three.  We  first  ex¬ 
hibit  a  new  contrast  function  which  constitutes  an  uni¬ 
fied  framework  for  the  underlying  contrasts  of  JADE 
and  STOTD  which  thus  appear  as  particular  cases. 
Then  we  generalize  the  link  between  these  new  con¬ 
trasts  and  a  joint-diagonalization  criteria.  Moreover 
for  the  generalized  JADE’s  contrast,  the  analytical  op¬ 
timal  solution  in  the  case  of  two  sources  is  derived  and 
shown  to  keep  the  same  simple  expression  whatever 
the  cumulant  order.  Finally,  some  computer  simula¬ 
tions  illustrate  the  potential  advantage  one  can  take  of 
considering  statistics  of  different  orders  for  the  joint- 
diagonalization  of  cumulant  matrices. 

1.  INTRODUCTION 

The  problem  of  source  separation  has  found  numerous 
solutions  in  the  past  decade.  Beginning  with  the  origi¬ 
nate  works  of  Herault  and  Jutten,  see  [1]  and  references 
therein,  who  have  proposed  an  adaptive  (on-line)  algo¬ 
rithm,  three  of  the  most  important  contributions  are 
provided  by  Comon  [2],  Cardoso  and  Souloumiac  [3] 
and  De  Lathauwer  [5].  These  later  solutions  are  block 
(off-line)  algorithms  which  are  both  closely  related  to 
contrast  functions  (also  simply  called  contrasts).  Such 
contrasts  were  introduced  and  defined  in  [2]  and  have 
recently  found  a  generalization  in  [4].  The  algorithm 
presented  in  [2]  is  called  ICA  for  “Independent  Com¬ 
ponent  Analysis”.  The  algorithm  presented  in  [3]  is 
called  JADE  for  “Joint  Approximate  Diagonalization 
of  Eigen-matrices”  and  the  one  presented  in  [5]  is  called 
STOTD  for  “Simultaneous  Third  Order  Tensor  Diago¬ 
nalization”  . 

The  JADE’s  and  STOTD’s  underlying  contrasts 
only  take  into  consideration  fourth  order  cumulants  on 
the  contrary  to  the  ICA’s  one  which  remains  available 
whatever  the  order  of  cumulants  is  since  it  is  greater 


than  or  equal  to  three.  On  the  other  hand,  the  fourth 
order  JADE’s  and  STOTD’s  contrasts  have  also  found 
interesting  interpretations  in  terms  of  a  joint-diagonali¬ 
zation  criteria.  For  the  JADE’s  contrast  (resp.  the 
STOTD’  contrast)  it  is  a  joint  diagonalization  criterion 
(maximized  w.r.t.  a  unitary  matrix)  of  some  cumulant 
matrices  sets  (resp.  of  some  cumulant  third  order  ten¬ 
sors  sets).  These  links  are  the  keys  for  the  derivation 
of  the  practical  JADE  and  STOTD  algorithms. 

In  this  paper,  we  are  mainly  interested  in  generaliz¬ 
ing  the  underlying  contrasts  of  JADE  and  STOTD  and 
their  links  with  joint-diagonalization  criteria  involving 
cumulants  of  any  order  greater  than  or  equal  to  three. 
The  main  interests  are  then  to  be  able  to  choose  the 
cumulants  order  or  to  combine  statistical  information 
of  different  orders. 

2.  PROBLEM  FORMULATION 

Signals  emitted  from  different  sources  are  observed 
thanks  to 

x(n)  =  Ga(n)  (1) 

where  n  €  Z  is  the  discrete  time,  a(n)  the  ( N ,  1)  vec¬ 
tor  of  A  ^  2  unobservable  real  input  signals  Oj(n), 
i  e  {1, called  sources,  a;(n)  the  (N,  1)  vector 
of  observed  signals  Xi(n),  i  6  {1, ...,2V)  and  G  the 
(N,  N )  square  mixing  matrix  assumed  invertible.  For 
clarity,  in  this  article  we  restrict  our  attention  to  the 
case  of  real  signals  and  mixtures  although  the  following 
derivations  might  be  easily  extended  to  complex  ones. 

Further,  the  following  assumptions  are  considered 
A1 .  “ Independence ”  The  sources  cq(n),  i  €  {1, . . . ,  N}, 
are  zero-mean,  unit  power  and  statistically  mutually 
independent; 

A2.  “Stationarity”  Oj(n),  i  €  {1, are  random 
signals  stationary  up  to  order  under  consideration,  i.e. 
Vi  €  {l,...,iV},  the  cumulant  Cum  [oj(ra), . . .  ,<Zj(rc)] 

v  v  v 

Rx 

is  an  independent  function  of  n,  denoted  by  C^[o,]; 
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moreover  at  most  one  of  the  cumulants  C /?[«,],  i  £ 
{1, . . . ,  N},  is  null. 

It  is  important  now  to  introduce  the  notion  of  white 
vectors  [2]  [3]  because  of  its  use  as  a  first  transformation 
in  the  JADE  and  STOTD  algorithms.  A  vector  z(n) 
of  random  signals  is  said  to  be  (spatially)  white  if  its 
covariance  matrix  Rz  —  E [zzT]  equals  the  identity. 
The  first  (second  order)  transformation  is  then  defined 
as  a  whitening  of  the  observation  vector  x(n).  This 
is  done  by  applying  a  whitening  matrix  B  in  such  a 
way  that  BG  —  V  where  V  is  a  unitary  matrix,  i.e. 
VVT  =  I.  Hence,  after  the  whitening  transformation, 
the  new  “observed”  vector  reads 

xt,(n)  —  Bx(n)  —  Va(n )  .  (2) 


3.  A  NEW  CONTRAST 

A  contrast  is  usually  a  function  of  the  output  of  the  sep¬ 
arating  system.  As  defined  in  [2] [4],  its  (global)  maxi¬ 
mization  arguments  yield  a  separating  solution,  i.e.  a 
matrix  H  such  that  the  global  matrix  S  can  be  fac¬ 
tored  as  in  (5).  The  definition  in  [4]  is  now  recalled  for 
readers  convenience. 

Definition  1  A  contrast  on  34  is  a  multivariate  map¬ 
ping  !(■)  from  yu  to  the  real  set  which  satisfies  the 
following  three  requirements: 

Rl-  Vy  £  yu>  VJD  £  V,  l(Dy)  =  l(y); 

RZ.  Va  e  A,  VS  G  U,  X(Sa)  <  1(a); 

R3.  Va  G  A,  VS  €  U,  l(Sa)  =  1(a)  =>  S  £  V. 


The  blind  source  separation  problem  consists  now  in 
estimating  a  unitary  matrix  H  in  such  a  way  that  the 
vector 

y(n)  =  Hxb(n)  (3) 

restores  one  of  the  different  (possibly  noisy)  sources  on 
each  of  its  different  components. 

Because  the  sources  are  inobservable  and  the  mix¬ 
ture  is  unknown,  the  exact  power  and  order  of  each 
sources  can  not  be  recovered.  It  is  the  reason  why  the 
separation  is  said  to  be  achieved  when  the  global  uni¬ 
tary  matrix  S  defined  as 


reads 


S  =  HV 

(4) 

S  —  DP 

(5) 

where  D  is  an  invertible  diagonal  matrix  (here  with 
unit  modulus  components)  corresponding  to  arbitrary 
attenuations  (here  signs)  for  the  restored  sources  and 
P  a  permutation  matrix  corresponding  to  an  arbitrary 
order  of  restitution.  According  to  (3),  (1)  and  (4)  the 
output  vector  can  be  written  as 


Historically,  one  of  the  first  contrast  can  be  found  in  [2]. 
Other  examples  of  contrasts  can  be  found  in  [4]..  But 
of  primary  importance  with  our  purpose,  in  [3]  and  [5] 
two  contrasts  involving  both  cross-cumulants  and  auto- 
cumulants  have  been  proposed.  The  JADE’s  contrast 
[3]  reads 


N 

J(y)=  E  (Cumfyij ,  Vit ,  yi2 ,  yi3])2  (7) 

il  >*2^3  =  1 

and  the  STOTD’s  contrast  [5]  reads 
N 

S(y)=  E  (Cumfj/jjjj/ijjj/ij, j/j2])2  .  (8) 

*1 1*3  =  1 

We  now  generalize  these  functions  to  cumulants  of  any 
order  greater  than  or  equal  to  three,  including  them 
in  a  common  generalized  family  of  contrasts.  This  is 
given  by  the  following  proposition. 

Proposition  1  Let  R,  R\  and  R2  three  integers  such 
that  R  =  R\  4-  R2,  R  >  3  and  2  <  Ri  <  R,  using  the 
notation 


y(n)  =  Sa(n)  .  (6) 

Because  of  the  stationarity  assumption,  the  explicit  de¬ 
pendence  of  sources,  observations  and  outputs  vectors 
with  the  discrete  time  n  will  be  now  omitted  whenever 
no  confusion  is  possible. 

Let  us  define  some  notations  which  will  be  useful 
in  the  following.  Let  A  be  the  set  of  random  vectors 
satisfying  assumptions  A1  and  A2.  Let  U  be  the  set 
of  unitary  matrices.  The  subset  of  U  of  matrices  S  of 
the  form  (5)  is  denoted  by  V  and  the  subset  of  V  of 
diagonal  matrices  is  denoted  by  V.  Finally  the  set  of 
random  vector  y(n)  built  from  (6)  where  a(n)  €  A  and 
S  €  U  is  denoted  by  yu. 


^■r  [y]  Cunrifj/jj , . . . ,  j/jj ,  j/j2, . . . ,  (9) 

x  R2  terms 

the  function 

G%'(y)=  E  ic£[y]l2  (io) 

*l.*2,...,tJt2  +  J  =  l 

is  a  contrast  on  34,  i-e.  for  white  vectors  y. 

By  definition  if  R2  =  0  no  corresponding  additional 
terms  are  considered  in  the  cumulant  in  (9)  which  then 
corresponds  to  an  auto-cumulant.  Hence  in  expression 
(10),  if  i?2  =  0  no  sum  over  i2  is  considered.  Let  us 
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remark  that  if  R2  =  0,  we  get  the  ICA’s  contrast.  On 
the  other  hand,  considering  J?i  =  i?2  =  2  we  have  the 
JADE’s  contrast  while  considering  R i  =  3  and  i?2  =  1 
we  have  the  STOTD’s  contrast.  All  other  values  of  Ri 
and  i?2  yield  a  new  contrast. 

4.  LINKS  WITH 
JOINT-DIAGONALIZATION 

We  now  show  a  link  between  the  above  contrast  QRl  (y) 
and  a  joint-diagonalization  criterion  of  R\ -order  sym¬ 
metric  tensors  by  an  unitary  matrix.  Such  a  joint- 
diagonalization  criterion  is  defined  according  to 

Definition  2  Considering  a  set  of  T  square  and  sym¬ 
metric  Ri-order  tensors  T(m),  m  =  l,...,T  denoted 
by  T.  A  joint- diagonalizer  of  this  set  is  an  unitary 
matrix  that  maximizes  the  function 

V(H,T)=Y,  felT".d(m)l2)  (H) 

m=l  \  i  / 

where 

=  ^  ]  Hi,n\  '  ‘ '  Hitn,Rl  Tnil...(nBl  im)  ■ 

ni,...,7iR1 

(12) 

Now  the  equivalence  between  the  contrast  Gr1  ( y )  and 
a  joint-diagonalization  criterion  of  Ri-order  tensors  can 
be  stated  according  to  the  following  proposition: 

Proposition  2  With  R  >  3  and  2  <  i?i  <  R,  let  Cj^1 
be  the  set  ofT  =  NR~Rl  tensors  of  order  Ri 

T{iRl+i,...,iR)  =  (Tiu...tiRl(iR1+i,...,iR)) 

defined  as 

{iRi+li  •  •  •  >*ii)  =  CumjXjj , . . . ,  XiR]  .  (13) 

Then,  if  H  is  a  unitary  matrix,  we  have 

'D(H,C*')=G%'(Hx)  .  (14) 

For  Ri  =  2  (resp.  R\  =3),  this  proposition  2  is  a 
generalization  of  one  result  in  [3]  (resp.  in  [5])  to  cu¬ 
mulants  of  any  order  greater  than  or  equal  to  three.  For 
all  other  values  of  R\,  this  corresponds  to  new  results. 

According  to  the  above  proposition,  we  can  now 
choose  the  order  of  cumulants  (greater  than  or  equal  to 
three)  for  the  joint-diagonalization  of  matrices  or  ten¬ 
sors.  In  particular  third  order  cumulants  can  be  used 
leading  to  the  joint-diagonalization  of  N  matrices  or  to 
the  diagonalization  of  only  one  tensor  of  order  three. 


However  even  if  it  is  sufficient  to  joint-diagonalize  ten¬ 
sors  of  cumulants  of  a  given  order,  one  can  find  interest 
in  combining  cumulants  of  different  orders.  For  exam¬ 
ple  this  can  lead  to  algorithms  that  are  more  robust 
w.r.t.  the  statistics  of  sources.  For  example  one  can 
combine  third  and  fourth  order  cumulants.  If  third 
order  (resp.  fourth  order)  cumulants  of  the  unknown 
sources  vanish  then  the  other  fourth  order  (resp.  third 
order)  ones  can  be  directly  used.  In  the  unfortunate 
case  where  both  third  and  fourth  order  cumulants  of 
the  sources  vanish,  then  one  has  to  consider  cumu¬ 
lants  of  greater  order.  Moreover  such  combination  can 
be  useful  for  an  independent  component  analysis  goal 
when  one  is  not  sure  that  the  available  data  conform 
the  initial  model.  Indeed  in  such  a  case  cross  cumulants 
of  all  orders  have  to  be  canceled. 

Now  given  the  order  of  tensors  to  be  joint-diagona¬ 
lized,  we  show  that  cumulants  of  different  orders  can 
be  considered  altogether.  This  is  given  according  to 
the  following  proposition: 

Proposition  3  Let  71 , . . . ,  7m  be  m  6  N*  real  non 
negative  constants  with  at  least  one  non  zero.  Let 
Si , . . . ,  Sm  be  m  integers  such  that  3  <  Si  <  •  •  •  <  Sm. 
Finally,  with  Ri  <  Si,  let 

\Zri  Gsl  =  1'(iR1^-i , ,  isi)} 

be  m  sets  of  R\  order  tensors  T(-)  of  Si  order  cumu¬ 
lants  as  defined  in  proposition  2.  Then,  if  H  is  a  uni¬ 
tary  matrix,  we  have 

m  m 

V(H,  U  VYi  Cf/ )  =  £  7i  (Hs)  .  (15) 

i= 1  i=  1 

Now  since  it  is  well-known  that  a  (non  zero)  non  neg¬ 
ative  linear  combination  of  contrasts  is  also  a  contrast 
then  the  joint-diagonalization  of  tensors  of  cumulants 
of  mixed  orders  is  again  a  sufficient  condition  for  sepa¬ 
ration. 

5.  GENERALIZATION  OF  THE  JADE 
ALGORITHM 

The  JADE  algorithm  is  based  on  Jacobi  optimization. 
This  means  that  the  maximization  of  the  criterion  un¬ 
der  consideration  is  realized  through  a  sequence  of 
plane  (or  Givens)  rotations  as  initiated  in  [2].  Each 
plane  rotation  works  on  a  pair  of  the  output  vector 
y(n )  and  one  “sweep”  or  iteration  consists  in  process¬ 
ing  the  outputs  through  all  the  N(N  —  l)/2  possible 
pairs.  Hence  the  A-dimensional  problem  is  reduced  to 
N(N  — 1)/2  problems  of  dimension  2.  One  of  the  main 
advantages  is  that  the  2-dimensional  problem  is  sim¬ 
pler  and  often  admits  an  analytical  solution.  Thus  let 
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us  now  consider  the  only  2-dimensional  problem  where 
a  plane  rotation  has  to  be  determined.  In  the  following, 
we  parameterize  it  as 

H=(  (16) 

\  -  sin  9  cos  9  J  K  ' 

We  only  consider  the  joint-diagonalization  of  matrices 
and  then  the  maximization  of  G%(y).  For  N  =  2,  it  is 
easily  seen  that  G\(y)  can  be  written  as 

Gr(v)  =  uj  Ar  ue  (17) 

where  uj  =  (cos 2 9  sin  29)  and  where  Ar  =  (Ar^j) 
is  a  (2, 2)  real  symmetric  matrix  defined  according  to 

Ar,i,\  =  £fl,i  ; 

Ar,  1,2  =  ffl, 4  ; 

Ar, 2,2  =  +  tR<3  ‘  (18) 

with,  using  (13)  with  Rx  =  2, 

tR,x  =  E^.iW^  +  ^W)2  ; 

i 

tR,2  =  E(Th2«)2  i 

i 

tR,3  =  Y^Tul(i)T2,2(i)  ; 

i 

tn, 4  =  2r1|2(<)(r1,1(0-2,3l2(t))  . 
i 

where  i  =  (i3, . . .  ,iR).  An  analytical  optimal  value  of 
9  can  be  derived  from  (17).  Indeed  one  finds  after  some 
simple  algebra  that  for  a  symmetric  matrix  Z  —  (Zij) 

uj  Z  ug  =  D  +  £?cos(4(0  -  a))  (19) 

where  D  =  ( Z\,\  +  Z2,2)/2  is  a  constant  term  since  it 
does  not  depend  on  9  and  E  is  a  non  negative  constant 

term:  E  =  —  Z2,2)ICZ)'1  +  Z22.  The  angle  a 

can  be  determined  as 

a=~  arctan  \Zlfi  ,  \{ZXyl  -  Z2,2)  j  (20) 

where  the  value  of  arctan(?/,  x)  is,  by  definition,  the 
unique  angle  (5  e  (-7r,7r]  for  which  cos (3  =  ^2+^2)1^2 
and  sin/3  =  ^2+^)i/2 •  Since  D  and  E  are  constant 
and  E  is  non  negative,  a  maximum  value  (w.r.t.  9) 
denoted  9opt  of  the  left  term  in  (19)  corresponds  to  the 
maximum  value  of  the  cosine,  that  is  1,  yielding 


Thus  one  has  to  directly  consider  Z  =  Ar  for  the 
derivation  of  a  solution.  Let  us  remark  that  following 
the  idea  of  Proposition  3  matrices  of  cumulants  of  dif¬ 
ferent  orders  can  be  considered  very  easily.  Indeed,  let 
us  consider  matrices  of  cumulants  of  orders  Si , . . . ,  Sm , 
m  €  N*  such  that  3  <  Si  <  •  •  •  <  Sm.  Then  according 
to  (17) 

m  /  m  \ 

GsSv)  =  uj  I  5^7 iASi  Ug 

i=  1  Vi=l  / 

where  the  7/’s  are  real  non-negative  constants  with  at 
least  one  non  zero.  Hence  it  is  sufficient  to  consider 
now  the  matrix  Z  =  liAs{  for  the  determination 
of  the  value  of  90 pt  in  (21),  (20). 

Such  a  generalized  algorithm  for  joint-diagonaliza¬ 
tion  of  cumulant  matrices  is  called  eJADE  for  “ex¬ 
tended  JADE”  when  considering  the  original  imple¬ 
mentation  of  JADE  and  adding  directly  the  new  ma¬ 
trices  to  be  joint-diagonalized. 

6.  COMPUTER  SIMULATIONS 

In  order  to  illustrate  the  potential  usefulness  of  the 
above  results,  some  computer  simulations  are  now  pre¬ 
sented.  For  the  fourth  order  case,  we  use  the  original 
JADE  algorithm  in  its  version  1.5  of  December  1997 
for  real  signals.  While  for  the  other  cases,  we  con¬ 
sider  eJADE  in  exactly  the  same  conditions.  We  use 
both  only  third  order  cumulant  matrices  whose  cor¬ 
responding  algorithm  is  denoted  eJADE(3),  and  third 
plus  fourth  order  cumulant  matrices  whose  correspond¬ 
ing  algorithm  is  denoted  eJADE(3,4).  We  first  use  a 
signal  with  parameterized  third  and  fourth-order  cumu¬ 
lants.  It  is  a  discrete  i.i.d.  signal  called  MS(a)  which 
takes  its  values  in  the  set  {-1  ,  0  ,  a}  with  the  re¬ 
spective  probability  {^  ,  The  real 

parameter  a  called  “cumulant  parameter”  is  such  that 
a  >  1.  Hence  for  a  MS(a)  signal  a(n),  one  easily  has 
E[o]  =  0,  E[o2]  =  1,  C3[o]  =  a- 1  and  C4[a]  =  a2-a-2. 
The  performances  of  the  algorithms  are  associated  to  a 
non  negative  index/measure  of  performance  [6]  which 
is  zero  when  the  separation  holds.  We  have  plotted 
both  the  mean  and  the  standard  deviation  (STD)  of 
the  estimated  index  over  500  Monte  Carlo  runs. 

Experiment  1:  With  N  —  3,  the  two  first  consid¬ 
ered  sources  are  MS  (a)  signals  while  the  third  one  is  a 
Gaussian  i.i.d.  signal.  The  following  mixing  matrix  is 
used 

/  1  0.9  0.9  \ 

G  =  I  0.8  1  0.9 

\  0.8  0.8  1  J 


9opt  —  o  . 


(21) 
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(22) 


We  plot  the  mean  and  STD  of  the  estimated  index  as 
a  function  of  the  cumulant  parameter  a.  The  data 
number  is  held  constant  to  N<j  =  400.  The  figure  Fig.l 
shows  that  the  performances  of  eJADE(3,4)  in  compar¬ 
ison  to  JADE  and  eJADE(3)  are  less  subject  to  varia¬ 
tion  w.r.t.  the  statistics  of  the  sources. 

Experiment  2:  In  this  case,  we  consider  two  speech 
signals  which  are  plotted  in  Fig. 2.  The  components 
of  (2,2)  mixing  matrix  are  chosen  randomly  with  an 
uniform  law  in  the  interval  [—1,1].  We  plot  the  mean 
and  STD  of  the  estimated  index  as  a  function  of  the 
data  number  taken  as  the  first  Nd  samples  of  each 
speech  signals.  The  figure  Fig. 3  shows  that  the  joint- 
diagonalization  of  only  third  order  cumulant  matrices 
with  eJADE(3)  can  be  sufficient  for  the  separation  of 
speech  signals  with,  however,  lower  performances.  This 
is  not  surprising  because  the  third  and  fourth  order 
cumulants  (estimated  over  the  whole  signals)  of  the 
speech  signals  we  use,  are  respectively  around  0.5  and 
2.7.  On  the  other  hand,  the  performances  of  the  algo¬ 
rithm  eJADE(3,4)  using  both  third  and  fourth  order 
cumulant  matrices  are  a  little  bit  better. 
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Figure  1:  Mean  and  STD  of  the  estimated  index  w.r.t 
the  cumulant  parameter  a. 


Figure  2:  The  two  real  speech  signals  used. 


Figure  3:  Mean  and  STD  of  the  estimated  index  w.r.t 
the  first  Nd  samples  of  the  speech  signals. 
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ABSTRACT 

The  problem  of  multichannel  blind  signal  deconvolu¬ 
tion  is  considered.  The  mixing  system  is  supposed  to  be 
stable  and  invertible  and  the  input  signals,  also  called 
sources,  are  assumed  zero-mean  independent  and  iden¬ 
tically  distributed  (i.i.d)  random  signals.  Using  the  hy¬ 
pothesis  that  sources  are  statistically  independent,  we 
propose  a  generalization  to  the  convolutive  case  of  some 
separation  criteria  available  in  the  instantaneous  one. 
Hence,  we  obtain  a  new  generalized  class  of  criteria  for 
signal  deconvolution. 

1.  INTRODUCTION 

The  problem  of  multichannel  blind  signal  deconvolu¬ 
tion  (or  blind  equalization)  of  Linear  Time  Invariant 
(LTI)  systems  arises  in  various  fields  of  engineering 
and  applied  sciences  among  which  radio  telescopy,  data 
communication,  passive  radar/sonar  processing,  seis¬ 
mic  exploration  and  so  on... 

In  the  past  ten  years,  most  of  the  “blindly”  operat¬ 
ing  approaches  have  encountered  a  “restrictive”  model 
commonly  known  as  sources  separation.  Indeed,  in 
such  a  problem,  the  coupling  channels  are  assumed 
unknown  yet  constant  gains.  The  goal  is  then  to  re¬ 
cover  the  inputs  from  the  only  outputs,  without  a  priori 
knowledge  neither  about  the  mixing  system  nor  about 
the  inputs  excepting  their  independence.  In  this  com¬ 
munication,  we  consider  the  more  general  model  im¬ 
plying  that  the  coupling  systems  are  unknown  LTI  sys¬ 
tems.  Some  solutions  based  on  high  order  statistics 
have  been  proposed  recently  in  the  literature  [8]-[14] . 
Here,  we  focus  on  those  which  are  based  on  the  opti¬ 
mization  of  contrasts  functions  or  contrats  (previously 
developed  in  the  context  of  sources  separation)  involv¬ 
ing  high  order  statistics. 

After  some  recalling  on  contrasts,  we  demonstrate  that 
it  is  possible  to  consider  contrasts  involving  cross-cumu- 
lants  (of  order  strictly  higher  than  two),  in  the  convo¬ 
lutive  case.  New  contrasts  of  this  type  are  then  pre¬ 
sented. 


2.  PROBLEM  FORMULATION 

We  consider  the  multichannel  LTI  and  generally  non- 
causal  system  described  by  the  following  equation: 

x{n)  =  G(k)a(n  -  k)  (1) 

k 

where  n  €  Z  is  the  time  index,  a(n)  is  the  ( N ,  1)  vector 
of  statistically  independent  sources,  x(n)  is  the  ( TV ,  1) 
vector  of  observations  and 

{G}  4  {G(n),n  6  Z} 

is  a  sequence  of  (N,N)  matrices  which  describe  the 
impulse  response  of  the  LTI  mixing  filter.  The  ( N ,  N) 
transfer  matrix  of  system  {G}  is  also  introduced: 

[G]  =f  G(z)  =f  G(k)z~k  (2) 

k 

where  z~l  stands  for  the  time-delay  operator. 

The  multichannel  blind  deconvolution  problem  consists 
in  estimating  a  LTI  filter  (equalizer)  {H(-)}  thanks  to 
the  only  outputs  (observations)  x(n)  of  an  unknown 
LTI  system  {G}  and  such  that  the  vector: 

y(n )  =  y'  H(k)x(n  -  fc)  (3) 

k 

restores  the  N  input  signals  Oj(n),  i  =  1, . . . ,  N. 

In  this  context,  it  is  useful  to  define  the  global  LTI  filter 
(S(-)}  according  to 

y{n )  =  S(k)a(n  -  k)  (4) 

k 

whose  transfer  function  is 

S(z)  =  Y/s(k)z-k  . 

k 

The  global  system  is  illustrated  in  Fig.l. 

To  solve  this  problem,  we  make  the  following  assump¬ 
tions: 


0-7803-5988-7/00/$  10.00  ©  2000  IEEE 
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Mixing  Deconvolution 


{S}  =  {H*G) 


Figure  1:  The  global  system. 

Al.  Each  input  cn{n),  i  =  1,..., TV  is  a  zero-mean 
independent  and  identically  distributed  (i.i.d.)  discrete 
random  signal.  Without  any  loss  of  generality  a;(n), 
i  =  1, . . . ,  N  can  be  assumed  unit  power.  Moreover,  we 
will  assume  that  the  cumulants  of  the  random  sources 
exist  and  that  for  a  given  order  R,  at  most  one  of  them 
has  a  null  cumulant.  We  recall  that  the  J?-th  order 
cumulant  (R  G  N* )  is  defined  as 

Cum[aj(n), . . .  ,aj(n)]  . 

' - v - " 

fix 

For  a  i.i.d.  signal,  it  does  not  depend  on  time  any  more. 
So,  it  is  simply  noted  Cfj[a;]. 

A2 .  The  unknown  LTI  system  {<?(•)}  is  assumed  stable 
and  invertible. 

As  sources  are  assumed  inobservable,  some  inherent  in¬ 
determinations  subsist  when  they  are  restored.  In  fact, 
in  general,  the  order,  the  power  and  the  time  origin  of 
each  components  of  the  source  vector  a(n)  can  not  be 
recovered.  Indeed,  the  multichannel  blind  deconvolu¬ 
tion  problem  combines  the  inherent  indeterminations 
of  the  source  separation  problem  together  with  the  in¬ 
herent  indeterminations  of  the  classical  blind  scalar  de- 
convolution  problem.  That  is  why,  we  consider  that  the 
multichannel  blind  deconvolution  problem  is  solved  if 
and  only  if  (iff)  the  global  LTI  system  {S,(-)}  reads: 

S(z)  ^  £$(*)*"* 

k 

=  D(z)D\P  (5) 

where  D(z)  is  a  diagonal  matrix  such  that  its  entries 
are 

du(z)  =  z~ni  ,  i  =  1, ..., N 

ni  are  integers,  Di  is  an  invertible  constant  diagonal 
matrix  and  P  is  a  permutation  matrix. 

Finally  the  global  system  is  assumed  to  be 

A3.  “ paraunitary ”  i.e.  S(z)  satisfies 
Siz^iz'1)  =  I 


It  can  be  noticed  that  this  last  assumption  can  be  made 
without  loss  of  generality  since  it  is  equivalent  to  guar¬ 
antee  that  the  signal  y(n)  is  white,  i.e. 

E [y{n)yT{n  -  fc)]  =  JT(5[/c] 

where  E[-]  denotes  the  mathematical  expectation  and 
<J[fc]  =  1  if  k  =  0  and  0  otherwise.  This  constraint  can 
always  be  satisfied  provided  that  a  classical  prewhiten¬ 
ing  of  the  observations  is  performed. 

3.  CONTRAST  FUNCTIONS 

First  introduced  for  instantaneous  mixing  [2],  contrast 
functions  have  been  recently  generalized  to  the  convo- 
lutive  case  [3,  8,  10].  Before  recalling  the  definition  of 
contrats,  let  us  first  introduce  some  useful  notations: 
A  stands  for  the  set  of  random  vectors  that  satisfy 
hypothesis  Al.  <5  stands  for  the  set  of  systems  S(z) 
satisfying  hypothesis  A2  and  A3.  The  subset  of  S  of 
systems  conforming  (5)  is  denoted  by  V .  The  set  of 
random  vectors  y(n)  built  from  (4)  with  a(n)  G  A  and 
S(z)  e  5  is  denoted  by  yA. 

Let  us  now  recall  the  more  general  definition  of  con¬ 
trasts  one  has  to  consider  in  the  convolutive  case  [10]: 

Definition  1  A  contrast  on  T. 4  is  a  multivariable  func¬ 
tion  !(■)  mapping  yA  on  M,  and  satisfying  the  three 
following  requirements: 

PI.  Vy  e  yA,  VD(z)  €  V,  l([D]y )  =  X{y)  ; 

P2.  Vo  €  A,  VS(z)  e  «S,  I([§]a)  <  1(a)  ; 

P3.  Va  e  A,  VS(z)  G  5,  !([§]«)  =  1(a)  =►  S(z)  G  V . 

One  of  the  first  contrasts  available  in  the  convolutive 
case  has  been  exhibited  by  P.  Comon  [3].  It  reads: 

Zfl(y)  =  E(c*M2  (6) 

u=i 

where  R  is  greater  than  or  equal  to  three.  It  is  a  di¬ 
rect  extension  of  a  contrast  previously  proposed  in  the 
instantaneous  case  [2], 

It  is  also  interesting  to  keep  in  mind  the  two  following 
properties  of  contrasts  [10]: 

Property  1  Given  X\{-)  a  function  of  y. 4  on  IR  and 
I2(-)  a  contrast  on  yA.  If 

Vy  G  yA,  Xi{y)  <  My)  m 

Va  G  A,  Ii  (a)  =  22(a)  1  ; 

then  2i(- )  is  a  contrast  on  yA. 


where  I  denotes  the  ( N ,  N )  identity  matrix.  The  second  one  is: 
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Property  2  If  I]  (■)  is  a  contrast  on  TU  then'ia  £ 
and  V/3  €  R,  I2(-)  =  ctZi  (•)  4-  /3  is  a/so  a  contrast  on 

y*. 

This  last  property  allows  us  to  define  the  notion  of 
equivalent  contrast  as 

Definition  2  If  1 1(-)  and  Z2(-)  are  two  contrasts  in  the 
sense  of  Property  2  then  they  are  said  to  be  equivalent. 

4.  NEW  AND  GENERALIZED  RESULTS 
4.1.  Contrasts  with  cross-cumulants 

Our  main  goal  is  to  generalize  some  contrasts  available 
in  the  instantaneous  case  to  the  convolutive  one.  At 
this  aim,  let  us  introduce  the  following  notation 

C»[i,<)  = 

Cum[yil  (n),  yh  (n),  yk  (n  -  i2), . . . ,  yiRi  (n  -  )] 

(8) 

where  R  stands  for  the  cumulant  order,  R,i  =  R  -  1 
and 

i  ~  (ii,...,iflj) 

l  =  (I2,  ■  ■  ■  • 

We  have  the  following  first  result: 

Proposition  1  Let  R  be  an  integer  such  that  R  >  3, 
the  function 

^,*(y)  =  E(c*M)2  0) 

is  a  contrast  for  white  vector  y(n). 

Proof.  To  demonstrate  this  result,  let  us  begin  with 
some  important  second  order  results.  Because  of  as- 

/sT 

sumption  A3,  the  inverse  of  the  global  system  is  S  (1  /z) 
and  then 

a(n)  =  E  sT(-k)y(n  -  k) 

k 

that  is  component  wise 

ai(n )  =  ^5y(-/)2/i(n  -  /)  . 
l,i 

Consequently,  denoting 

Rah  ,aj2  (h ,  k2)  =  E [ojq  (n  +  ki)aj2  ( n  -I-  fc2)] 


and  using  the  fact  that  y(n )  is  a  white  vector,  we  have 
Rah,ah  ikl,k2) 

=  ^h,h  ^h,i2  (— h)Si2j2(— Z2) 

E[yh  (n  +  h  -li)yi2(n  +  k2  -  Z2)] 

=  it  Riiji (k2  ~  h)Si1j2(k\  —  li) 

On  the  other  hand,  we  also  have 

Rah  ,aj2  (ki,k2)  =  S[ji  -  j2]5[ki  -  k2] 

Then 

E  Shh(k 2  -  h)Silj2(ki  -h)  =  8[j 1  -j2]<$[/ci  -  k2) 

h  »*i 

(10) 

and  thus 

E(5hii(*  i))2  =  1  .  (11) 

fci.h 

Involving  that 

—  E(g»iii(fci))  =i- 

fcldl  fcl.il 

Using  the  multilinearity  property  of  cumulants  and  us¬ 
ing  (10),  we  have 

JiMv)  =  E  (E(5hdi(fei))4)  (CnKJ)2 

ii.fcl  \  ii  / 

and  then 

Ji,fl(y)<E(c«[°h])2  =  ^(a) 

il 

Moreover  itjs  easy  to  see  that  we  have  the  equality  if 
and  only  if  S(z)  satisfies  (5),  yielding  that  Ji,R{y)  is  a 
contrast.  <) 

This  result  can  be  seen  as  a  first  generalization  to  the 
convolutive  case  of  the  underlying  contrast  of  JADE 
algorithm  given  in  [1],  But  R  -  2  delays  have  to  be 
taken  into  account  in  the  contrast  which  can  make  its 
optimization  rather  cumbersome.  Let  11s  thus  simplify 
Ji,n(y)  •  At  this  aim,  we  now  consider  the  following 
parameterized  vector  of  delays 

^ct  {la(2)i  •  •  •  1 

where  q(-)  is  any  application  from  the  set  {2, . . . ,  f?i } 
to  the  set  {0, 2, . . .  Moreover,  by  definition,  if 

a(i)  =  0  for  a  given  i  £  {2,  ...,Ri}  then  no  corre¬ 
sponding  delay  is  considered.  Using  the  notation  (8) 
with  la,  we  have  the  following  result: 
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Proposition  2  Let  R  be  an  integer  such  that  R  >  3, 
the  function 

•7S(v)  =  E(cftMJ)2  (12) 

i,£a 

is  a  contrast  for  white  vector  y(n). 


Proof.  We  have  the  two  following  relations 

Jr(v)  <  JiMv) 

Jr{o)  =  Ji,Ria) 


(13) 


Using  the  result  in  Proposition  1,  the  Property  1  allows 
us  to  conclude  that  J^(y)  is  a  contrast.  0 


N 

C2(y)  =  5Z  (Cumfj/jj , , yi2,yi2])  ; 

*2>’l 


N 

C3(y)  =  (Cumlyinj/iL^.yi*])  • 

•l'i2.'3=1 
•  3>*2 

These  functions  contain  only  cross-cumulants  of  differ¬ 
ent  types.  Then  we  have  the  following  result: 


Proposition  3  Consider  three  real  numbers  a<,  i  = 
1,...,3  such  that  Vi,  a*  <  1,  then,  denoting  a  = 
(au,a2)  £*3)  the  function 


In  order  to  illustrate  this  proposition,  we  present  now 
some  examples.  First,  if  a(-)  is  the  identity  application, 
i.e.  a(-)  =  Id(-),  then 

<7Ad(y)  =  JiMv)  • 

Hence  Proposition  2  is  a  generalization  of  Proposition  1. 
Now  if  a(-)  is  the  null  application,  i.e.  a(-)  =  O(-) 
where  Vfc,  0(k)  —  0,  then  no  delay  is  taken  into  con¬ 
sideration  and  the  contrast  simply  reads 

Jr(v)  =  ^2{Cum[yh,yh,yh,---,yiRl])2  ■ 
i 

This  result  is  now  the  direct  generalization  of  the  un¬ 
derlying  contrast  of  the  JADE  algorithm  to  the  convo- 
lutive  case.  This  is  the  only  case  where  no  delays  have 
to  be  considered.  Thus  this  latter  contrast  contains  the 
minimum  number  of  cumulants.  For  fourth  order  cu- 
mulants,  if  a(l)  =  0  and  a(2)  =  2  and  denoting  oi(-) 
this  application,  we  have  the  following  contrast 


JiMv)  =  2-4  (y)  +  2(aiCi(y)  +  a2C2(y)  +  c*3C3(y)) 

(14) 


is  a  contrast  for  white  vectors  y(n). 


Proof.  Because  Vi,  o,  <  1,  we  have  the  two  following 
relations  „ 

<74, a  (y )  <  <74°(y) 

<74  ,a(a)  =  J?  (a) 


(15) 


Recalling  that  jfiy)  is  a  contrast,  the  property  1  al¬ 
lows  us  to  conclude  that  J\,ot{y)  is  a  contrast.  0 


Let  us  first  notice  that  if  a  =  aD  where  ot0  =  (0, 0, 0) 
then 

Ji.aAy)  =74(y) 

which  is  the  contrast  in  (6).  On  the  other  hand,  if 
a  =  O]  where  a4  =  (1, 1, 1)  then 


<74,ai(y)  =  J? (y)  • 


Jr  (y)  = 

2  (Cum [yh  (n),  yh  {n),yi2  ( n),yh  (n  -  i^)]) 

On  the  other  hand,  if  a(l)  =  1  and  a(2)  =  1  and 
denoting  a2(-)  this  application,  we  have  the  following 
contrast 

JnHy)  = 

T,i,tx  (Cum [j/jj  (n) ,  yh  (n) ,  yi2  (n  -  i  1 ) ,  yi3  {n  -  4 )]) 

This  is  just  few  examples  of  the  family  of  contrasts  one 
can  find  using  Proposition  2. 

4.2.  Contrasts  with  parameterized  cross-cumulants 

Now,  to  simplify,  let  us  focus  on  the  case  of  fourth- 
order  cumulants  with  no  delays  in  the  contrasts.  We 
define  then  the  following  three  functions 

N 

Cl (y)  =  (Cum [yi^Vh^h^h])  ; 

*1.*2  =  1 
>2*‘l 


All  other  values  of  a*,  i  =  1, . . . ,  3,  yield  a  new  contrast. 

For  simplicity,  we  have  restricted  our  attention  to 
the  case  of  fourth  order  cumulants  with  no  delays  in 
the  cross-cumulants.  But  using  the  results  in  Proposi¬ 
tion  2,  Proposition  3  can  be  easily  generalized  to  any 
order  of  cumulants  with  or  without  delays. 

More  generally,  if  a  given  contrast  involves  cross- 
cumulants,  then  following  the  same  principle  as  the  one 
applied  to  find  the  above  result,  one  can  parameterize 
alike  its  cross-cumulants. 

5.  CONCLUSION 

In  this  paper,  we  have  generalized  some  contrasts  avail¬ 
able  in  the  case  of  instantaneous  mixtures  to  the  con- 
volutive  one.  We  have  shown  that  cross-cumulants  can 
be  used  in  this  case  and  that  they  can  be  parameter¬ 
ized  which  leads  to  a  more  general  family  of  contrasts. 
In  all  cases,  delays  can  be  taken  into  consideration  in 
order  to  use  temporal  statistical  informations. 
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ABSTRACT 

In  this  paper,  we  are  interested  in  the  separation  of  N 
independent  sources  recorded  simultaneously  by  N  re¬ 
ceivers.  The  mixture  is  realized  instantaneously  through 
an  unknown  constant  matrix  M. 

When  the  spectral  densities  of  the  sources  are  differ¬ 
ent,  several  methods  using  second  order  moments  have 
been  proposed  whose  results  are  convincing.  Neverthe¬ 
less,  these  methods  are  no  more  efficient  when  their 
spectral  densities  are  the  same.  Our  talk  is  interested 
in  this  special  case  where  sources  may  even  be  white. 
The  method  we  propose  is  based  on  the  evaluation  of 
second  order  moments  estimated  from  extracted  series 
of  the  observations.  We  will  talk  of  conditional  second 
order  moments. 

An  iterative  algorithm  is  proposed  which  calculates,  at 
each  step,  a  matrix  Ki  so  that  KnKn-\...KiM  tends, 
when  n  increases,  to  DII,  product  of  a  diagonal  matrix 
and  a  permutation  matrix. 

We  show  that  restrictive  conditions  on  the  probability 
distributions  of  the  sources  must  be  verified  to  assure 
the  separation. 

In  the  two-dimensional  case,  we  prove  that  the  algo¬ 
rithm  separates  uniformly  distributed  sources,  and  that 
it  doesn’t  separate  gaussian  sources. 

The  algorithm  proposed  is  robust  towards  the  number 
of  sources;  simulations  with  more  than  20  uniformly 
distributed  sources  were  successful. 

1.  INTRODUCTION 


same. 

First  we  introduce  the  mixture  model.  Second  we  ex¬ 
plain  the  method  used  to  retrieve  the  sources  and  the 
restrictive  conditions  necessary  to  reach  the  separation. 
Then  we  restrict  our  talk  to  the  two  dimensional  mixing 
case  where  calculation  are  easier  to  do.  In  this  partic¬ 
ular  case,  conditions  of  separation  are  given.  The  case 
of  uniformly  distributed  sources  is  treated  with  more 
details.  The  gaussian  case  is  proved  to  be  impossible 
to  separate  by  such  a  method. 

Finally,  results  obtained  on  simulated  data  are  shown. 

2.  BASIC  ASSUMPTIONS  AND  MODEL 

Let  consider  N  sources  assumed  to  be  centered,  sta¬ 
tionary  and  statistically  independent.  These  sources 
can  be  represented  by  a  matrix  A  =  [xi  X2  ...  xjv]4 
of  dimension  N  x  L.  They  are  simultaneously  recorded 
by  a  set  of  N  receivers:  the  available  observations  are 
represented  by  a  matrix  F  =  [yx  y2  ...  y/v]4  of  dimen¬ 
sion  N  x  L. 

The  observations  Y  are  linked  to  the  sources  X  by  the 
linear  relationship: 

Y  =  MX,  (1) 

where  M  is  an  unknown  constant  matrix. 

What’s  more,  for  reasons  explained  later,  we  will  re¬ 
strict  our  topic  to  identically  distributed  sources  whose 
probability  distributions  are  symmetrical. 

Note  that  vectors  are  noted  in  bold  font. 


We  present  a  new  iterative  method  to  separate  N  in-  3.  THE  IDEA  OF  THIS  ALGORITHM 

dependent  sources  instantaneously  mixed  through  an 

unknown  constant  matrix  M.  It  is  important  to  note  The  now  classical  methods  using  second  order  statistics 

that  the  spectral  densities  of  the  sources  may  be  the  (AMUSE  [1]  [2],  SOBI  [3]  or  IMISO  [4])  to  separate 
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instantaneous  mixtures  of  independent  sources  proved 
they  were  efficient  when  the  power  spectral  densities 
(PSD)  of  the  sources  are  different.  For  example,  the 
IMISO  algorithm  exploits  the  covariance  matrix  and 
its  second  derivative:  it  has  been  proved  that  if  the 
PSD  are  not  identical,  conditions  are  gathered  to  suc¬ 
ceed.  However,  if  these  conditions  are  not  verified,  they 
fail. 

In  the  case  of  sources  with  same  PSD,  it  is  no  more 
possible  to  calculate  a  matrix  C  such  as  CY  =  DUX 
(we  retrieve  X  except  for  a  diagonal  matrix  D  and  a 
permutation  matrix  n),  and  the  way  of  an  iterative  for¬ 
mulation  seemed  attractive. 

The  idea  is  always  to  find  a  linear  transformation  that 
makes  independent  the  observations,  but  using  condi¬ 
tional  and  recursive  methods. 

4.  PRESENTATION  OF  THE  RECURSIVE 
ALGORITHM 

Historically,  this  method  was  first  used  to  separate  two 
white  uniformly  distributed  sources.  But  the  algorithm 
had  to  be  modified  to  take  the  specific  aspects  of  larger 
problems  into  account.  So,  this  section  is  broken  down 
in  two  parts  to  reproduce  this  approach. 

4.1.  General  topic 

The  results  described  in  this  subsection  are  general. 
From  the  observations  Y ,  let  extract  a  sequence  of  the 
first  observation  yi(£):  we  select  the  indices  t  =  1  to 
L  for  which  yi(t)  is  positive  (i.e.  we  create  a  vector 
S  whose  elements  are  these  indices;  the  length  of  S  is 
L8).  Then  we  create  extracted  observations  Ys  (Ys  is 
a  matrix  of  dimensions  N  x  Ls).  That  means  that  for 
i  =  1  to  N  and  t  —  1  to  Ls,  the  ith  component  of  Ys  is 
the  vector  ysi{t)  =  yj(S(f))  of  dimension  Ls. 

With  the  constraint  chosen  here,  E{Ys}  =  Ys  i1  0  and 
we  will  note  Ys  =  Ys  —  Ys- 

From  these  extracted  observations  we  calculate  the  co- 
variance  matrices  of  Ys  and  P’s  noted: 

Ni  =  E{YsYj} 

and 

n2  =  E{YsYj} 

which  are  N  x  N  matrices. 

We  will  use  the  same  notation  for  Xs  the  extracted 
sequence  of  X  corresponding  to  S,  E{Xs}  =  Xs  =  m 
and  Xs  =  Xs  —  m. 


From  equation  (1)  we  can  deduct: 

Ni  =  MEiXsX^M1, 

N2  =  ME{XsX£}A/*  =  M[E{XsX£}  -  nun*]  Af*. 

Let  note  EfXsXj}  =  Ds-  In  the  general  case,  Ds 
is  not  diagonal;  however,  if  the  laws  of  X{  are  sym¬ 
metrical,  Ds  is  diagonal.  We  will  restrict  our  talk,  to 
simplify,  to  the  case  of  sources  Xi  with  the  same  sym¬ 
metrical  density  distribution;  this  assumption  assures 
that  Ds  becomes  a  scalar  matrix  (Ds  =  osl  where  I  is 
the  identity  matrix). 

Now  let  define: 

r  =  N^N2  =  AT* [I  -  D^rnm^M1, 

and 

G  =  I  -  D  j'mm1. 

Note  that  neither  Xs  nor  m,  Ds  and  then  G  are  di¬ 
rectly  attainable. 

Let  denote  P  and  Q  the  eigenvectors  and  eigenvalues 
of  T  =  M^GM1,  i.e.  such  as  TP  =  PQ. 

Then  GMlP  =  MlPQ. 

If  we  note  V  =  MlP,  V  and  Q  are  the  eigenvectors 
and  eigenvalues  of  G  (i.e.  GV  =  V Q). 

P  and  Q  depend  on  Y  and  are  then  calculable,  while 
V  is  not  because  it  depends  on  M. 

In  the  particular  case  of  sources  with  the  same  sym¬ 
metrical  density  distribution 

r  =  (2) 

We  know  that  the  eigenvectors  of  mm*  are  m  and  m+ 
for  j  =  1  to  N-l  with  respectively  the  eigenvalue  m*m 
(multiplicity  1)  and  0  (multiplicity  N-l),  where  the  m j- 
are  the  IV- 1  vectors  orthogonal  to  m. 

Thus,  the  eigenvectors  of  G  are  m  (with  the  eigenvalue 
1  —  (7g1m<m)  and  the  m j-  (with  the  eigenvalue  1). 
Let’s  write  the  equation  (1)  as: 

y(D  =  m(1)X,  (3) 

(YB)  =  Y  and  MB)  =  M)  that  denotes  the  first  itera¬ 
tion  of  our  algorithm. 

Applying  P*  to  the  initial  observations  YB)j  we  ob¬ 
tain  a  combination  of  the  y *  noted  Y*2)  =  P*P'B)  called 
the  "new  observations".  Then, 

y(2)  _  pty(  1)  =  PlM^X  =  M^X 

appears  as  a  new  mixture  of  sources,  M*2)  being  the 
mixing  matrix: 

M(2)  =  ptM  (1)  =  yt,  (4) 
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The  algorithm  will  be  efficient  if  M ^  is  closer  than 
M1-1'1  to  a  diagonal  matrix  (except  for  a  permutation 
matrix).  So  the  properties  of  must  be  studied  to 
prove  the  advantages  of  the  method. 

Here,  the  first  step  of  our  algorithm  is  finished.  If  prop¬ 
erties  of  M are  satisfying,  this  step  can  be  iterated 
again  on  the  new  observations  Y^2\ 

Summarized  Basic  Algorithm 

1)  Find  the  indexes  for  which  the  first  observation 

is  positive  and  put  them  in  a  vector  S 

2)  Create  the  extracted  observations  Ys 

3)  Compute  the  centered  extracted  observation  Ys 

4)  Estimate  the  observation  covariance  matrices 

N0  =  E{Fsy|}  and  iVi  =  E{Fsyj}. 

5)  Compute  V  —  N1  1iV0  and  its  eigenvectors  P. 

6)  Compute  the  transformed  observations  PlY . 


4.2.  Estimation  of  the  performance  of  the  algo¬ 
rithm 

To  quantify  the  performances  reached,  we  use  the  now 
well  accepted  performance  criterion,  positive  real  value 
which  permits  to  know  how  far  is  a  matrix  M  from  the 
product  DP  of  a  diagonal  and  a  permutation  matrix. 
This  value,  noted  ind(M)  is  zero  if  M  =  Tin  and  is  at 
worst  equal  to  N  the  dimension  of  M. 

The  stopping  of  the  iterations  is  done  when  ind(P)  is 
smaller  than  a  value  defined  in  advance. 


4.3.  Limitation  of  the  method 


The  normalized  matrix  V  =  MlP  verifies 

1 


V 


y/m\  + 


m ; 


2  L 


mi  —m2 
m2  mi 


(5) 


except  for  a  permutation  matrix;  the  eigenvalues  ma¬ 
trix  is  given  by 


1  _  m,+mg  q 
<rs 

o  1 


(6) 


Q  = 

4-4- 1.  Whitened  observations 

If  observations  are  whitened,  Y  become  Yw  which  ver¬ 
ifies  :  Yw  =  MWX  where  Mw  can  be  written 


AT.,  — 


a  (3 

-(3  a 


This  case  is  interesting  because  the  performance  index 
of  Mw  is  easily  calculated:  ind(Mw)  = 

We  can  deduct  P  =  kM~tV  from  (5),  where  k  is  a 
constant  used  to  normalize  P: 


P  = 


k 


( a 2  +  /32)y/mf  +  mf  L 


a  (3 

mi  -m2 

-/?  a 

m2  mi 

The  normalization  of  P  imposes  the  value  of  k  and 
implies  P  =  \Ja2  +  (32M^tV . 

Then,  M^>  becomes 


M(2) 


1  a2  +  /32 

mi 

m2 

J  m2  +  m| 

— m2 

mi 

(7) 


The  proof  of  the  efficiency  of  the  algorithm  must  be 
given  for  each  particular  sort  of  probability  distribu¬ 
tion.  The  theoretical  calculation  of  V  (and  therefore  of 
mi  and  m2)  depends  on  the  probability  distribution  of 
the  sources,  and  may  hardly  be  done  in  a  general  way. 
For  all  these  reasons  we  will  focus  our  attention  on  two 
remarkable  cases  of  particular  interest:  the  gaussian 
and  uniform  ones. 

4.4.  2D  separation  problem  :  N  —  2 

In  this  particular  case,  we  note  m  =  [mx  m2]*  and 
m  L  =  [—m2  mi]*. 

We  will  note  M  as  follow: 


and  without  loss  of  generality  we  will  suppose  that  a  7^ 
b,  and  even  |a|  >  |6]  (of  course,  M  is  invertible). 


The  algorithm  is  efficient  if  ind(M^)  is  smaller  than 
ind(M^D),  that  means 

1  mm(|mi|,|m2|)  ,  ■  min{\a\,  \(3\)  ■ 

'  mox(|mi|,  |m2|)  ' max(\a\,\(3\)' 

There  still  remains  the  problem  of  calculating  mi 
and  m2.  As  mentioned  previously,  the  result  depends 
only  of : 

— >  the  probability  distributions  of  the  sources, 

— >  the  parameters  a  and  (3. 

4. 4-2.  Not  whitened  observations 

In  that  case,  calculation  are  almost  the  same.  The 
difference  lies  in  the  fact  that  V  is  found  except  for  the 
power;  that  implies  that  V  is  multiplied  to  the  right 
by  a  diagonal  matrix  A  =  diag{Ai}\i!j-i,...N.  Then, 
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Mi2>  =  V1  is  in  fact  equal  to  hfV1',  each  line  of  A'P2'1  of  P  and  application  of  P*  to  Y.  This  methods  exper- 

is  multiplied  by  the  constant  A,:  imentally  converges  to  two  different  sources  out  of  A. 

A  circular  permutation  on  the  indices  i  and  j  allows 
to  converge  to  other  sources;  using  the  whole  combina¬ 
tions  of  couples  i,j,  we  retrieve  all  the  sources. 


M(2)  = 


Aimi 

A2m2 


-Ax  m2 
A2mi 


Nevertheless,  as  sources  can  be  retrieved  only  except 
for  their  power,  the  performance  index  of  M is  the 
same  than  in  the  previous  subsection. 


5.  CALCULATION  OF  M  IN  DIFFERENT 
CASES 


4.5.  A  >  2  problem 

If  A  >  2,  this  algorithm  needs  to  be  adapted.  As 
seen  before,  there  is  indeed  only  one  eigenvector  whose 
eigenvalue  of  multiplicity  one.  It  appears  experimen¬ 
tally  that  the  algorithm  applied  strictly  as  described 
above  leads  to  one  of  the  sources.  In  the  particular 
case  A  —  2  however,  when  one  source  is  found,  the 
second  one  is  automatically  found  too. 

But,  using  another  observation  to  find  S  will  lead  to 
another  source,  and  so  on  ...  Practically,  the  main  dif¬ 
ficulty  lies  in  the  fact  of  knowing  what  observation  has 
converged  to  a  source. 

Several  ideas  were  exploited  : 

•  Whitening  the  observations  Y,  we  know  that  at 
each  step,  the  transformation  matrix  P  must  be  orthog¬ 
onal.  The  algorithm  is  modified  as  follows  :  the  steps  of 
the  previous  algorithm  are  performed  successively  for 
each  of  the  A- 1  first  observations;  each  time  we  retain 
the  eigenvector  p,  of  P  which  corresponds  to  the  non 
multiple  eigenvalue  of  T.  We  create  a  matrix  Pt  from 
these  A- 1  orthogonal  vectors,  the  Nth  is  created  from 
the  A  —  1  first  ones.  is  orthogonal.  Experimentally, 
this  modified  algorithm  converges  to  the  A  sources. 

As  mentioned  above,  there  remains  the  difficulty  of  the 
recognition  of  the  sources  among  the  signals  obtained. 

•  Creating  Si  and  Sj  corresponding  to  the  observa¬ 
tions  yi  and  yj,  we  calculate  the  following  matrices 

Ax  =  E  {YSiYl} 

and 

n2  =  E{r5i%}. 

Then,  as  above, 


5.1.  Two  dimensional  case 

As  shown  by  equation  (8),  the  values  of  mi  and  m2 
must  be  calculated  to  assure  the  improvement  com¬ 
pleted  by  our  algorithm.  Unfortunately,  this  calcula¬ 
tion  have  to  be  done  for  each  density  distribution. 

5.1.1.  Uniformly  distributed  sources 

If  Xi  and  X2  are  standardized  uniformly  distributed 
(ll^ill  =  1)>  *-e.  Pi(xx)  =  P2(x2)  ~  U[-w;+w],  cal¬ 
culations  can  be  lead  completely.  Supposing  that  the 
observations  have  been  whitened  and  that  H  >  \Pl 
the  expression  of  m\  and  m2  are  the  following  : 

“■  =  id  -  5<?>2) 


It  follows  from  these  expressions  that  |^|  >  |^|  and 
therefore,  ind(M^)  <  ind(MU)). 

If  |a|  <  |/?|,  the  expressions  of  mi  and  m2  are  inverted. 

At  the  first  step,  the  separation  is  rarely  reached  (ex¬ 
cept  for  2-states  signals)  but  a  convergence  to  the  good 
solution  has  began. 

It  can  be  proved  that  the  iterations  lead  to  a  ratio 
\^\  which  is  zero  or  infinite.  Then  the  separation  is 
reached.  This  result  can  be  extended  without  any  prob¬ 
lem  to  sources  taking  a  limited  number  of  states  (with 
symmetrical  probability  distributions),  if  all  these  states 
have  the  same  probability  to  occur. 

Practically,  during  tests  done  with  simulated  data,  the 
convergence  was  always  obtained. 

5.1.2.  Gaussian  Distributed  Sources 
The  results  (2)  to  (8)  of  the  previous  section  stay  true. 


Ax  =  M[E{Xs<As,.}  -  mjmj4]  Mf 

and 

A2  =  M[E{As.X^.}-mjmjt]M4 

where  mk  =  E{.Xsfc }  for  k  =  i,j. 

T  is  calculated  as  previously  by  T  =  Af1A2,  and  the 
continuation  of  the  procedure  is  the  same:  calculation 


As  Ax  and  X2  are  gaussian,  i.e.  pi(xi)  =  p2(x2)  ~ 
N(0;  cr2),  in  the  case  of  whitened  observations,  the  ex¬ 
pressions  of  mi  and  m2  are  the  following  : 


^  _  1  f  Q:2  ~ 

7721  -  v'2-  V 

m2  =  ^fe\/ 


(10) 
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Obviously  \r(ff\  =  |||.  No  improvement  occurs  during 
the  first  step  of  the  algorithm.  M^2)  is  not  more  dis¬ 
criminating  than  towards  the  sources. 

It  is  not  possible  to  separate  the  sources. 

6.  SIMULATION  RESULTS 
6.1.  Two  white  sources 

A  loop  of  500  tests  where  M  is  each  time  a  random 
matrix  is  performed.  To  evaluate  the  performances  of 
the  algorithm,  we  calculate  at  the  end  of  convergence 
(triggered  by  the  value  of  \ind(P)\)  the  performance 
criterion  of  M.  Simulations  are  done  upon  the  following 
kinds  of  sources  : 

•  2  states  uniformly  distributed  sources  (2S), 

•  4  states  uniformly  distributed  sources  (4S), 

•  8  states  uniformly  distributed  sources  (8S), 

•  16  states  uniformly  distributed  sources  (16S), 

•  uniformly  distributed  sources  (unif). 

The  results  are  shown  in  the  table  below;  the  columns 
depict  respectively  : 

the  kind  of  probability  distribution  of  the  sources, 

-»  the  mean  and  the  deviation  of  the  final  performance 
criterion  of  M  noted  i(M)  —  ind(M^°°^), 

the  mean  of  the  2  mean  square  errors  between  the 
estimated  sources  and  the  true  sources, 

->■  the  mean  of  the  number  of  iterations  to  converge. 


type 

mean(i{M)) 

&i(M) 

mean  mse 

Ite 

2S 

5.13  x  nr6 

7.53  x  10-^ 

3.92  x  10“& 

1.0 

4S 

6.49  x  10~a 

7.59  x  10~5 

4.66  x  10-* 

1.8 

8S 

9.25  x  10-* 

1.09  x  10~4 

4.84  x  10-5 

6.3 

16S 

3.56  x  10-4 

4.50  x  10“4 

8.71  x  10~b 

6.7 

unif 

1.59  x  10^ 

4.24  x  10~4 

2.06  x  10~4 

9.0 

6.2.  Robustness  of  the  algorithm 

To  evaluate  how  robust  is  the  algorithm  towards  the 
number  of  sources,  we  use  it  to  separate  a  growing  num¬ 
ber  of  uniformly  distributed  signals.  The  iterations  are 
stopped  when  the  mean  of  the  mean  square  errors  be¬ 
tween  the  estimated  sources  and  the  true  sources  is  less 
than  10-4  or  when  the  performance  index  of  P  is  less 
than  10-10  (that  means  that  an  iteration  doesn’t  mod¬ 
ify  significantly  the  mixing  matrix).  The  table  here¬ 
after  shows  the  number  of  iterations  needed  to  converge 


(100  tests  done  and  the  mse  obtained  are  all  less  than 
10-4): 


number  of  sources 

I 

2 

4.8 

3 

5.4 

4 

12.8 

5 

14.4 

6 

15.7 

8 

17.9 

10 

20.6 

15 

30.1 

20 

42.3 

7.  CONCLUSION 

To  our  knowledge,  the  approach  presented  in  this  work 
is  innovative.  The  results  obtained  on  simulated  data 
show  that  this  algorithm  is  robust  towards  the  number 
of  sources  mixed:  simulations  with  more  than  20  uni¬ 
formly  distributed  sources  were  successful. 

The  main  advantage  of  this  approach  is  that  it  works 
with  sources  having  the  same  spectral  densities,  when 
classical  methods  become  ineffective. 

Modified  algorithms  of  the  one  presented  in  this  paper 
have  been  successfully  tested  but  need  still  to  be  totally 
justified. 
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ABSTRACT 

Tugnait,  and  Chi  and  Chen  proposed  multi-input  multi¬ 
output  inverse  filter  criteria  (MIMO-IFC)  using  higher-order 
statistics  for  blind  deconvolution  of  multi-input  multi-output 
(MIMO)  linear  time-invariant  (LTI)  systems.  This  paper 
proposes  a  performance  analysis  for  the  MIMO  linear  equal¬ 
izer  associated  with  MIMO-IFC  for  finite  SNR,  including 
(PI)  perfect  phase  equalization  property,  (P2)  a  relation  to 
MIMO  minimum  mean  square  error  (MIMO-MMSE)  equal¬ 
izer,  and  (P3)  a  connection  with  the  one  obtained  by  Ye¬ 
ung  and  Yau’s  MIMO  super-exponential  algorithm  (MIMO- 
SEA)  that  usually  converges  fast  but  no  guarantee  of  con¬ 
vergence  for  finite  data.  Furthermore,  based  on  (P3),  a 
MIMO-IFC  based  algorithm  with  performance  similar  to 
that  of  the  MIMO-SEA  and  with  guaranteed  convergence 
is  proposed.  Finally,  some  simulation  results  are  presented 
to  support  the  analytic  results  and  the  proposed  algorithm. 

1.  INTRODUCTION 

Blind  deconvolution  of  a  multi-input  multi-output  (MIMO) 
linear  time-invariant  system,  denoted  H[n]  ( PxK  matrix), 
is  a  problem  of  estimating  the  vector  input  u[n]  =  («i[n], 
...,  u*r[n])T  (K  inputs)  with  only  a  set  of  non-Gaussian  vec¬ 
tor  output  measurements  x[ra]  =  (xj[n],  ,..,ip[n])T  ( P  out¬ 
puts)  as  follows  [1-3] 


OO 

x[n]  =  H[fc]u[n  -  A;]  +  w[n]  (1) 

k=  —  oo 

where  w[n]  (Px  1  vector)  is  additive  noise.  Blind  decon¬ 
volution  of  MIMO  systems  in  multiuser  detection  of  wire¬ 
less  communications  includes  suppression  of  multiple  access 
interference  (MAI)  and  removal  of  multiple  transmission 
paths  that  axe  crucial  to  the  receiver  design  of  multiuser 
communications  systems. 

Let  v[n]  =  (t>i[n],  ...,vp[n])T  denote  a  linear  FIR  equalizer 
of  length  L  =  Lj  —  Li  4-  1  for  which  v[n]  /  0  for  n  = 
Li,Zo  +  1,...,L2.  Let  cum{yi,y2,...,yp}  denote  the  pth- 
order  cumulant  of  random  variables  y,,  y2,  •••,  yP  and  ?{•} 
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denote  discrete-time  Fourier  transform  operator.  For  ease 
of  later  use,  let  us  define  the  following  notations 

cum{y  :  p, ...}  =  cum{yi  =  y, ....  yv  =  y, ...} 

"  Cp,q{y}  =  cum{y  :  p,  y*  :  q} 

Vj  =  (vj[Li],...,vj[L2])T 
»  =  (vT,vl...,v£)T 


Xj[n]  =  ( Xj[n  -  Li], ...,  xj[n  -  L2])T 
Rij  =  E[x’[n}xJ  [n]]  (L  x  L  matrix) 
R  =  {Ri,j}  (P  x  P  block  matrix) 


where  y*  denotes  the  complex  conjugate  of  y.  Then  the 
output  e[n]  of  the  FIR  equalizer  v[n]  can  be  expressed  as 

p  p 

e[n]  =  ^2vj[n]*xj[n]  =  YlvJxj[n\  (2) 

3= 1  3=1 

K 

=  Y2  *  UJ  W  +  w[n]  by  (1)  (3) 

j=i 

where  tu[n]  is  the  noise  term  due  to  w[n]  and 
p  l2 

Sj[n]  =  Vm[l]hm,j[n  —  l]  (4) 

m- 1  1=1, 


where  hm,j[n\  is  the  (m,ji)th  component  of  H[n].  The  de¬ 
signed  linear  equalizer  is  usually  evaluated  by  the  amount 
of  intersymbol  interference  (ISI)  defined  as  [3, 4] 


TOT/  r  {Ei,„|aiNI2}-niaxi,„{|sJ[n]|2} 

ISI(e["|) - -  (5) 

Note  that  ISI(e[n])  =  0  as  s<[n]  =  a<5[n  —  r]  and  Sj [n]  =  0 
for  j  ±  l. 


Single-input  single-output  inverse  filter  criteria  (SISO-IFC) 
[4-6]  using  higher-order  cumulants  have  been  widely  used 
for  blind  deconvolution  and  their  performance  analyses  for 
finite  SNR  have  been  reported  by  Feng  and  Chi  [5,6].  In  this 
paper,  we  propose  performance  analyses  for  cumulant  based 
multi-input  multi-output  inverse  filter  criteria  (MIMO-IFC) 
[1,2].  Furthermore,  based  on  the  analytic  results,  a  MIMO- 
IFC  based  algorithm  with  performance  similar  to  that  of  Ye¬ 
ung  and  Yau’s  MIMO  super-exponential  algorithm  (MIMO- 
SEA)  [3]  and  with  guaranteed  convergence  is  proposed. 
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2.  REVIEW  OF  MIMO-IFC  AND  MIMO-SEA 


Assume  that  we  are  given  a  set  of  measurements  x[n],  n  =  0, 
1, N—  1,  modeled  by  (1)  with  the  following  assumptions: 

(Al)  Uj  [n]  is  zero-mean,  independent  identically  distributed 
(i.i.d.)  non-Guassian  with  variance  <rl.  and  (p+g)th- 
order  cumulant  Cp,q{uj[n]},  and  statistically  inde¬ 
pendent  of  itjtfn]  for  all  k  ^  j. 

(A2)  The  MIMO  system  H[n]  is  exponentially  stable. 

(A3)  The  noise  w[n]  is  zero-mean  Gaussian  and  statisti¬ 
cally  independent  of  u[n]. 

Chi  and  Chen  [2]  find  the  optimum  v  by  maximizing  the 
following  MIMO-IFC 


_  |cum{e[n]  :p,e*[n]  :  q}\ 
v,q  [cum{e[n],  e*[n]}](p+,)/2 


(6) 


where  p  and  q  axe  nonnegative  integers  and  p  +  q  >  3 
through  using  iterative  optimization  algorithms  because  all 
MIMO-IFC  Jp,q  are  a  highly  nonlinear  function  of  v.  Note 
that  the  MIMO-IFC  given  by  (6)  include  Tugnait’s  MIMO- 
IFC  [1]  for  (p,  q)  =  (2, 1)  and  (p,  q)  =  (2, 2)  as  special  cases. 


The  MIMO-SEA  proposed  by  Yeung  and  Yau  [3]  iteratively 
updates  v  at  the  7  th  iteration  by  solving  the  following  linear 
equations 


‘  I  ir'd"-" 


.a"-' 


where  d11  11  =  (df,  d%, ...,  dp)T  in  which 


(7) 


di  =  cum{et/_11[n]  :  r,  (e[/  ^[n])’  :  s  -  1, ®*[n]}  (8) 


in  which  r  +  s>  3  and  eS1  1][n]  is  the  equalizer  output  ob¬ 
tained  at  the  (I  —  l)th  iteration. 


A  known  fact  and  two  observations  regarding  MIMO-IFC 
and  MIMO-SEA  are  as  follows: 

(FI)  In  the  absence  of  noise  (i.e.,  SNR  =  oo),  the  opti¬ 
mum  e[n]  =  aiut[n  -  r<]  ( perfect  equalization )  (i.e., 
ISI(e[n])  =  0)  for  both  MIMO-IFC  and  MIMO-SEA 
as  L\  — ►  — oo  and  Li  ~¥  oo  where  £  €  (1,2, 
is  unknown.  For  finite  SNR  and  L,  ui[n]  =  e[n]  is 
an  estimate  of  M/[n]  up  to  a  scale  factor  and  a  time 
delay,  and  can  also  be  estimated  as 


t  ,,,  _  E{xx[n  +  k}ut[n}} 

E[|S/[n]|2]  ’  ’  ’  W 


Estimates  ui[n],  Ui [n],  ...,  2k [n]  can  be  obtained  by  the 
MIMO-IFC  or  MIMO-SEA  in  a  non-sequential  order  through 
a  multistage  successive  cancellation  (MSC)  procedure  [1] 
that  includes  the  following  two  steps  at  each  stage: 

(51)  Find  an  input  estimate,  said  u<[n]  (where  t  is  un¬ 
known),  and  the  associated  channel  estimates  hi,/[n], 
t  =  1,  2,  ...,  P  using  MIMO-IFC  or  MIMO-SEA. 

(52)  Update  ®i[n]  by  x,[n]  -  2r[n]  *  hi,<[n],  *  =  1,2,  ...,  P. 

3.  PERFORMANCE  ANALYSIS  FOR 
MIMO-IFC 

Prior  to  presenting  analytical  results  for  the  performance 
of  the  FIR  equalizer  v[n]  associated  with  MIMO-IFC,  let 
us  present  the  nonblind  MIMO  minimum  mean  square  er¬ 
ror  (MIMO-MMSE)  equalizer,  denoted  Vmmse(w)  (K  x  P 
matrix),  that  has  some  relation  to  v[»].  It  can  be  shown  by 
orthogonality  principle  [8]  that 

vSmseM  =  |VV)]-1  •  «»  •  S  (10) 

where  K(uj)  =  ^{R[fc]}  =  f{B[x[n]x"(n  -  &]]},  77  (w)  = 
^{H[n]>  and 

S  =  diagjcrjj,  ...,<t2k}.  (H) 

Some  analytical  results  regarding  the  optimum  v[n]  for  fi¬ 
nite  SNR  are  summarized  as  follows: 

Property  1.  The  optimum  overall  impulse  response  sj  [n] 
given  by  (4),  j  =  1,  ...,  K,  are  linear  phase  for  finite  L, 
i.e.,  their  phase  responses  are  given  by 

argtSj  M]  =  wt,'  +  ,  Vw  €  [-7T,  ir)  (12) 

where  Sj  (u>)  =  J^{sj  [n]},  r,  and  are  real  constants.  □ 

Property  2.  The  optimum  V(ui)  =  ^{vfn]}  for  Li  -4  — oo 
and  Li  — >  oo  is  related  to  Vmmse(w)  by 

V{ul)  =  VmMSe(w)  •  (fXp,qSp,qDPlq(w)  +  OCq,pSq,pDq,p(u)j 

(13) 


where 

_  p  ■  Ci,i{e[n]} 
ap'q  ~  (p  +  9)  ’  Cq,p{e[n]}  ’ 


(14) 


SPl,  =  &&g{Cq,p{ui[n}}loll,...,Cq,p{uK\n)}l(jlK}  (15) 
and 


(01)  The  computationally  efficient  MIMO-SEA  converges 
at  a  super-exponential  rate  for  SNR  =  00  and  suf¬ 
ficiently  large  N,  but  it  may  diverge  for  finite  SNR 
and  N. 

(02)  With  larger  computational  load  than  solving  the  lin¬ 
ear  equations  given  by  (7),  gradient  type  iterative 
MIMO-IFC  algorithms  (such  as  Fletcher-Powell  algo¬ 
rithm  [7])  always  spend  more  iterations  (lower  con¬ 
vergence  speed)  than  MIMO-SEA. 


Dp,q(w)  =  [D1(u>),...,DK(w))T  (16) 

in  which 

Dj(W)  =  ^{sj[n](s*[n]r1}.  (17) 

□ 

Property  3.  The  optimum  v[n]  and  the  one  obtained  by 
the  MIMO-SEA  are  the  same  for  p  =  q  =  r  =  s>2  and 
finite  L.  ^ 
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Furthermore,  based  on  Property  3  and  the  observations 
(01)  and  (02),  a  fast  iterative  algorithm  is  proposed  for 
finding  the  optimum  v[n]  associated  with  MIMO-IFC  for 
p  =  q  as  follows: 

Algorithm  1.  Given  i//_i  and  e^/_1^[n]  obtained  at  the 
(I  —  l)th  iteration,  i>i  at  the  7th  iteration  is  obtained  by 
the  following  two  steps. 

(Tl)  As  the  MIMO-SEA,  obtain  «//  by  solving  (7)  with 
r  =  s  —  p  =  q  and  obtain  the  associated  e^[n], 

(T2)  If  Jp,p{i>i)  >  Jp,p{vi- i),  go  to  the  next  iteration, 
otherwise  update  i 'j  through  a  gradient  type  opti¬ 
mization  algorithm  and  obtain  the  associated  e^[nj. 

It  can  be  easily  shown  that 

1 

“  Cp,p{eV~V[n]}  '  \  ) 

C,i.i{e[/-1)[n]}  '  (18) 

where  d[/  11  has  been  obtained  in  (Tl)  (see  (7))  and  R  is 
the  same  at  each  iteration,  indicating  simple  and  straight¬ 
forward  computation  for  obtaining  dJPtP(i>)/d v  in  (T2), 
Let  us  conclude  this  section  with  the  following  remark: 

(Rl)  Algorithm  1  performs  as  a  fast  gradient  type  MIMO- 
IFC  algorithm  with  convergence  speed,  computational 
load,  and  ISI  similar  to  those  of  MIMO-SEA  (due  to 
the  step  (Tl))  and  with  guaranteed  convergence  (due 
to  the  step  (T2)). 

4.  SIMULATION  RESULTS 

A  two-input  two-output  system  taken  from  [1]  was  con¬ 
sidered  with  the  two  inputs  «i[n]  and  «2[n]  assumed  to 
be  equally  probable  binary  random  sequences  of  {+1,-1}. 
The  synthetic  data  x[n]  for  N  =  900  and  SNR  =  15 
dB  (spatially  independent  and  temporally  white  Gaussian 
noise)  were  processed  by  the  inverse  filter  v[n]  of  length 
L  =  30  (Li  =  0  and  L2  =  29)  associated  with  MIMO-IFC 
using  the  iterative  Fletcher- Powell  algorithm  [7],  MIMO- 
SEA  and  Algorithm  1,  respectively,  with  p  =  q  =  r  =  s  =  2. 
The  initial  condition  associated  with  i/o  was  i>i[n]  =  u2[n]  = 
<5[n  —  14]  for  the  first  stage  and  v\  [n]  =  S[n  —  14]  and 
V2  [n]  =  0  for  the  second  stage  of  the  MSC  procedure. 

Thirty  independent  realizations  of  the  optimum  si[n]  (as¬ 
sociated  with  tti[n])  and  the  associated  thirty  ISI  versus 
iteration  number  obtained  at  the  first  stage  of  the  MSC 
procedure  are  shown  in  Figures  1(a)  through  1(f)  using  the 
three  algorithms,  respectively.  One  can  see,  from  Figure  1, 
that  the  resultant  si[n]’s  are  linear  phase  and  they  are  sim¬ 
ilar  for  the  three  algorithms  thus  verifying  Properties  1  and 
3,  while  the  convergence  speed  for  the  proposed  Algorithm 
1  is  basically  the  same  as  that  of  MIMO-SEA  and  faster 
than  the  MIMO-IFC  using  Fletcher- Powell  algorithm,  thus 
verifying  (02).  The  corresponding  results  for  s2[n]  and  ISI 
obtained  at  the  second  stage  of  the  MSC  procedure  are 
shown  in  Figures  2(a)  through  2(f).  These  results  also  sup¬ 
port  Properties  1  and  3,  and  (02),  but  the  MIMO-SEA 
failed  to  converge  in  one  realization  (see  Figure  2(d))  and 


the  associated  s2[n]  failed  to  approximate  a  delta  function 
(see  Figure  2(c))  thus  verifying  (01).  Algorithm  1  outper¬ 
forms  the  other  two  algorithms  in  the  second  stage  of  the 
MSC  procedure  because  the  former  converges  as  fast  as  the 
MIMO-SEA  in  all  the  thirty  realizations  (without  any  di¬ 
vergence)  and  converges  faster  than  MIMO-IFC  using  the 
Fletcher-Powell  algorithm. 

5.  CONCLUSIONS 

We  have  presented  a  performance  analysis  for  the  MIMO 
linear  equalizer  v[n]  associated  with  Chi  and  Chen’s  MIMO- 
IFC  for  finite  SNR,  including  perfect  phase  equalization, 
a  relation  to  the  nonblind  MIMO-MMSE  equalizer,  and 
equivalence  to  the  one  associated  with  MIMO-SEA  for  p  = 
q  =  r  =  s,  as  presented  in  Properties  1,  2  and  3  respec¬ 
tively.  Based  on  Property  3,  a  MIMO-IFC  based  algorithm, 
Algorithm  1,  was  presented  that  performs  as  the  MIMO- 
SEA  (in  terms  of  ISI,  computational  load  and  convergence 
speed)  with  guaranteed  convergence  (see  (Rl))  while  the 
latter  may  not  converge  for  finite  SNR  and  data  (see  (01)). 
Some  simulation  results  were  also  presented  that  support 
the  proposed  analytical  results  and  Algorithm  1.  The  ap¬ 
plication  of  MIMO-IFC  to  multiuser  detection  of  CDMA 
systems  using  Algorithm  1  is  under  study. 
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Fig.  1.  Thirty  simulation  results  of  si[n]  and  ISI  versus  iteration  number  I  at  the  first  stage  of  the  MSC  procedure,  (a) 
si[n]  and  (b)  ISI  associated  with  MIMO-IFC  for  p  =  q  =  2  using  Fletcher-Powell  Algorithm,  (c)  si[n]  and  (d)  ISI  associated 
with  MIM O-SEA  for  r  =  s  =  2,  and  (e)  si[n]  and  (f)  ISI  associated  with  Algorithm  1  for  p  =  q  =  r  =  s  =  2. 
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Fig.  2.  Thirty  simulation  results  of  s2[n]  and  ISI  versus  iteration  number  I  at  the  second  stage  of  the  MSC  procedure,  (a) 
82  M  and  (b) ISI  associated  with  MIMO-IFC  for  p  =  q  =  2  using  Fletcher-Powell  Algorithm,  (c)  s2[n]  and  (d)  ISI  associated 
with  MIMO-SEA  for  r  =  s  =  2,  and  (e)  s2[n]  and  (f)  ISI  associated  with  Algorithm  1  for  p  =  q  —  r  =  s  =  2. 
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ABSTRACT 

We  consider  the  blind  separation  of  an  instantaneous  mixture  of 
non  stationary  source  signals,  possibly  normally  distributed.  The 
asymptotic  Cram^r-Rao  bound  is  exhibited  in  the  case  of  known 
source  distributions:  it  reveals  how  non  stationarity  and  non  Gaus- 
sianity  jointly  governs  the  achievable  performance  via  an  index  of 
non  stationarity  and  an  index  of  non  Gaussianity. 

1.  INTRODUCTION 

The  problem  of  blind  separation  of  instantaneous  mixtures  is  most 
often  addressed  by  exploiting  the  possible  non  Gaussianity  of  the 
sources.  Actually,  this  is  the  only  possible  route  when  the  source 
signals  are  independently  and  identically  distributed  (i.i.d)  [3]. 
As  a  corollary,  i.i.d.  sources  can  be  separated  only  when  they  are 
not  normally  distributed.  When  the  first  ‘i’  of  ‘i.i.d.’  is  not  valid, 
i.e.  when  the  source  signals  are  correlated  in  time,  another  route 
is  to  exploit  these  correlations;  identifiability  is  granted  provided 
the  source  signals  have  different  spectra  (see  e.g.  [6,  1]  for  more 
elaborate  statements  and  some  algorithms)  even  when  the  signals 
are  normally  distributed. 

In  this  paper,  we  consider  the  case  when  the  second  ‘i’  of  ‘i.i.d’ 
is  invalid,  that  is,  we  set  out  to  achieve  signal  separation  by  exploit¬ 
ing  the  variation  in  distribution  of  the  source  signals.  Essentially, 
we  consider  the  problem  of  separating  non  stationary  signals. 

The  first  section  describes  the  model  of  interest  in  its  sim¬ 
plest  possible  form  and  gives  the  estimating  equations  of  the  max¬ 
imum  likelihood  estimator.  The  second  section  gives  the  results 
of  an  asymptotic  analysis  for  the  asymptotically  achievable  accu¬ 
racy  of  separation  in  the  simple  case  when  the  source  distributions 
are  known  in  advance.  This  case  never  occurs  in  practice  but  the 
results  of  the  analysis  provide  us  with  an  upper  bound  to  the  per¬ 
formance;  it  also  shows  what  is  the  measure  of  non  stationarity 
which  governs  blind  separability.  A  final  section  of  the  manuscript 
outlines  the  implementation  of  novel  source  separation  algorithms 
based  on  the  Gaussian  non  stationary  model. 

2.  LIKELIHOOD 

We  consider  a  source  separation  model  which  is  as  simple  as  pos¬ 
sible  but  still  lets  the  non  stationary  features  appear  as  clearly  as 
possible:  an  instantaneous  mixture  where  T  samples  of  a  random 
n-vcctor  x(f)  are  represented  as  the  mixture  by  an  invertible  nxn 
matrix  A  of  T  samples  of  a  ‘source  vector’  s(t): 

x(t)  =  As(f),  1  <  f  <  T.  (1) 


The  model  for  the  joint  distribution  of  x(l), . . . ,  x(T)  is  specified 
as  soon  as  we  specify  the  joint  distribution  of  x(l), . . . ,  x(T).  Be¬ 
fore  describing  our  approach,  we  first  show  how  blind  identifiabil¬ 
ity  can  stem  from  non  stationarity  in  a  simple  case. 

A  simple  case.  We  first  give  a  very  simple  example  to  show  that 
blind  separation  is  possible  even  for  Gaussian  sources  thanks  to 
non  stationarity.  Let  us  assume  for  instance  that  the  data  points 
are  observed  during  two  different  regimes:  during  a  first  period, 
the  covariance  matrix  of  the  source  vector  is  a  diagonal  matrix  Ai 
then,  during  a  second  period  it  is  a  different  diagonal  covariance 
matrix  A2.  If  these  two  periods  are  known,  one  can  compute  the 
sample  covariance  matrix  of  the  observations  over  each  of  them, 
yielding  estimates  of  Ri  =  AAi  A*  and  of  R2  =  AA2AK  This 
particular  structure  determines  almost  completely  matrix  A  since 
the  columns  of  A  are  the  eigenvalues  of  R1R21.  These  are  gen¬ 
erally  unique  (up  to  the  usual  indeterminations  of  permutation  and 
scale).  Denoting  B  =  A-1,  we  also  remark  that  BR4B f  =  Ai 
for  i  =  1,2:  matrix  B  jointly  diagonalizes  the  two  covariance 
matrices.  This  line  of  reasoning  was  followed  in  [9]  and  [8].  See 
section  4  to  see  how,  under  more  general  assumptions,  the  Gaus¬ 
sian  log-likelihood  turns  out  to  be  a  joint  diagonalization  criterion 
of  covariance  matrices. 

Maximum  likelihood.  We  shall  now  consider  the  maximum  like¬ 
lihood  solution  for  model  (1)  when  the  source  distributions  are 
known.  The  model  for  the  sequence  of  source  signals  is  as  fol¬ 
lows.  The  sequence  (s(f)}  is  not  modeled  as  i.i.d.  (which  im¬ 
plies  stationarity)  but  as  ‘Temporally  Independently  Distributed’, 
abbreviated  in  ‘t.i.d.’  in  the  following,  and  meaning  that  s(t)  is 
distributed  independently  from  s(t')  for  t  ±  t’ .  In  addition,  we 
maintain  the  usual  assumption  that  the  components  of  the  source 
vector  s(f)  are  mutually  independent  for  each  t.  Therefore,  denot¬ 
ing  by  m  the  density  of  the  i-th  component  of  s(t),  the  density  of 
a  sample  s(l), . . . ,  s(T)  is  the  product: 

p(S(D,.,s(T))=nn^))  (2) 

t~  1  i= 1 

In  this  ‘t.i.d.  model’,  the  relative  gradient  [2]  of  the  log-likelihood 
of  T  data  points  is  easily  found  to  be 

r 

— Vlogp(x(l), . . .  ,x(T)|A)  =  (^t(y(f))y(f)t  -  I)  (3) 

i=l 

where  I  denotes  the  identity  matrix,  where 

y  (t)  =  A_1x(f)  (4) 
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and  where  </>t(  )  is  the  vector-to- vector  mapping: 


tMy  )],=*,(y)  =  -aM.  (5) 

This  is  obtained  in  a  straightforward  generalization  of  the  i.i.d. 
case. 


assuming  that  such  a  a  limit  actually  exists  (if  it  does  not,  it  be¬ 
comes  difficult  to  carry  any  asymptotic  analysis. . . ).  In  particular, 
the  average  power  of  the  i-th  signal  is: 

E{sl}  =  \im^ESi(t)2.  (11) 

t 


Specializing  to  a  Gaussian  non  stationary  model.  In  the  fol¬ 
lowing,  special  attention  is  paid  to  the  Gaussian  case:  at  time  t, 
the  i-th  source  is  drawn  according  to  a  zero-mean  Gaussian  dis¬ 
tribution  with  variance  This  is  s(t)  ~  A/^O,  At)  where  the 
diagonal  covariance  matrix  at  time  t  is 

Af  =  diag(an, o2n).  (6) 

In  this  case,  the  score  functions  <f>ti  are  the  linear  functions: 

<t>ti(y)  =  4  (?) 

°ti 

or,  matrix- wise: 

<^?(y)  =  Ar1y-  (8) 

Combining  eq.  (3)  and  eq.  (7),  the  stationary  points  of  the  likeli¬ 

hood  are  — in  the  Gaussian  case — ,  the  solutions  of  the  estimating 
equations: 


3.2.  Rejection  rates. 

The  accuracy  of  an  estimate  A  of  A  in  terms  of  source  separation 
can  be  measured  by  the  associated  ‘rejection  rates’.  If  the  source 
vector  is  estimated  as  s  =  A"'x,  then  s;  =  so 

that  the  average  power  of  the  y'-th  source  in  the  estimate  of  the  z-th 
source  is 

[A^A)l E{s2}  (12) 

while  the  average  power  of  the  i-th  source  itself  in  the  same  esti¬ 
mate  is 

[A-'AfaEtf}.  (13) 

For  a  regular  estimator,  the  asymptotic  variance  of  the  estimate  A 
is  expected  to  decrease  as  T~1  so  that  [A~1A]2j  is  of  order  1/T 
for  i  /  j  while  [A~1A\u  converges  to  the  constant  1.  Therefore, 
a  significant  characterization  of  the  accuracy  of  a  given  estimator 
is  obtain  by  evaluating  the  asymptotic  rejection  rates: 

|M1  ,14, 

which  are  nothing  but  properly  scaled  interference-to-signal  ratios. 


Of  course,  if  the  variances  are  assumed  to  be  constant,  (arti  =  rr;), 
this  set  of  equations  becomes  redundant:  the  (i,  j)-th  term  pro¬ 
vides  us  with  the  same  condition  as  the  (j,  i)-th  term  and  the  model 
is  not  identifiable.  In  contrast,  in  the  non  stationary  case,  these  two 
conditions  are  a  priori  distinct  and  the  set  (9)  of  equations  yields  a 
number  of  constraints  equal  to  the  number  of  unknown  parameters 
in  A.  A  similar  set  of  estimations  has  been  derived  in  [4]  without 
reference  to  the  maximum  likelihood  principle. 

3.  ACHIEVABLE  PERFORMANCE  IN  SEPARATING 
NON  GAUSSIAN  NON  STATIONARY  SOURCES 

In  this  section,  we  compute  the  Fischer  information  matrix  in  the 
t.i.d.  case  when  the  source  distributions  are  known  at  each  time 
instant.  By  this  device,  we  obtain  an  expression  for  the  asymp¬ 
totic  Cram£r-Rao  bound  which  can  be  simply  and  directly  related 
to  the  achievable  separation  and  to  the  non  stationarity  and  non 
Gaussianity  of  the  sources. 

3.1.  Non  stationary  averages. 

The  asymptotic  derivations  developed  by  Pham  [7]  for  the  station¬ 
ary  case  can  be  generalized  to  the  non  stationary  case.  However, 
some  statistical  moments  must  receive  a  more  general  definition, 
adapted  to  the  non  stationary  case:  the  mathematical  expectation 
operator  E  must  be  replaced  by  a  limit  of  expected  values.  This 
will  be  denoted  by  the  operator  E  defined  as  follows.  If  {At}  is  a 
sequence  of  random  variables  with  the  distribution  of  Xt  depend¬ 
ing  on  f,  we  write 

E{X}  =f  ^lirr^  i  EXt,  (10) 

t= l 


3.3.  The  Fisher  information  matrix. 

The  computation  of  Fisher  information  matrices  (FIMs)  in  source 
separation  is  facilitated  by  resorting  to  the  relative  gradient.  It  is 
equivalent  to  a  local  re-parameterization  in  term  of  a  relative  (or 
multiplicative)  variation:  in  order  to  compute  the  FIM  at  point  A, 
any  matrix  in  the  neighborhood  of  A  is  expressed  as  A(I  +  £) 
where  matrix  £  is  the  ‘local  parameter’  or  ‘relative  parameter’. 
The  ‘relative  score’  is  the  derivative  of  the  log-likelihood  with  re¬ 
spect  to  £  evaluated  at  £  =  0,  that  is  the  matrix  with  entries: 

9H  =  log  (p(x(1)i  •  •  •  >  *{T)\A(I  +  £))  (15) 

£—0 

Similar  to  eq.  (3),  the  relative  score  is  found  to  be 

T 

9ij  ~  ~  )  '  —  Sij }  (16) 

e=i 

In  the  t.i.d.  model,  thanks  to  the  independence  assumptions,  it 
is  not  difficult  to  evaluate  the  ‘relative  FIM’,  i.e.  the  covariance 
matrix  of  the  relative  scores.  One  finds  that  for  i  /  j,  gij  is  only 
correlated  to  gj,  (and  to  itself  !)  and  that  gu  is  uncorrelated  to 
gki  unless  i  =  k  =  l.  Therefore  the  relative  FIM  is  an  n2  x  n2 
matrix  which  is  block  diagonal:  there  are  n(n  -  l)/2  blocks  of 
size  2  x  2  for  each  pair  1  <  *  <  j  <  n  of  sources  which  are 
the  covariance  matrices  of  [giugji]*  and  n  blocks  of  size  lxl 
for  each  source,  equal  to  Eg%,  1  <  i  <  n.  The  2x2  blocks 
are  the  most  interesting.  Using  independence  and  the  zero-mean 
assumption,  one  readily  finds  that  for  i  /  j, 

E9ij  =  (17) 

t 

Egijgji  =  T  (18) 
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where  we  have  used  that  E4>ti(si(t))si(t)  =  1.  This  completes 
the  computation  of  the  Fisher  information  matrix  but  does  not  pro¬ 
vide  many  insights.  More  interesting  expressions  are  obtained  by 
relating  the  FIM  to  the  rejection  rates  and  introducing  the  addi¬ 
tional  assumption  of  ‘independent  non-stationarities’  described  at 
next  section. 


3.4.  Independent  non  stationarities. 

In  our  model,  the  distributions  of  the  sources  at  each  time  instant 
are  fixed.  We  cannot  expect  meaningful  result  without  some  kind 
of  assumption  expressing  that  the  distributions  of  the  sources  are 
‘independent’  themselves.  We  will  consider  the  case  where 

E{ct>nsi)s2j}  =  E{^{si)}  E{s2j}.  (19) 

This  condition  must  be  understood  as  some  ‘independence  of  the 
non  stationarities’  since  it  expresses  that  the  sequence  {Ecp2  (s;(t))} 
is  ‘uncorrelated’  with  the  sequence  {Es^t)}  of  the  variances  of 
the  j- th  source.  The  word  ‘uncorrelated’  is  quoted  in  the  previ¬ 
ous  sentence  because,  in  the  model  under  consideration,  these  se¬ 
quences  are  not  random  sequences:  one  should  rather  talk  of  a 
limiting  empirical  decorrelation  between  them.  Of  course,  since 
these  two  sequences  refer  to  two  different  sources,  this  ‘empirical 
decorrelation’  condition  (19)  is  expected  to  hold  in  many  practical 
situations  as  soon  as  two  source  signals  originate  from  physically 
independent  processes. 

Under  this  assumption  (19),  the  expression  (17)  for  Egfj  pro¬ 
duces  the  limiting  form 

lim  T~l  Eg%  =  (Ri  +  1)  (20) 

T-»oo  ^\sii 

where  we  have  defined  the  scalars  Ri , . . . ,  Rn  as 


RidA!  E{cj)2i(si)}E{s2}  -1.  (21) 


Therefore,  the  limit  for  the  2  x  2  sub-block  of  the  FIM  correspond¬ 
ing  to  the  (i,  j)-th  pair  of  sources  is  given  by 


limT  xCov 

T 


3.5.  The  Cramer-Rao  bound 

We  have  obtained  at  eq.  (22)  an  asymptotic  expression  for  a  2  x  2 
sub-block  of  the  FIM  for  the  relative  parameter  £.  Because  the 
FIM  is  block-diagonal,  it  suffices  to  invert  these  sub-blocks  to  ob¬ 
tain  the  large  sample  Cramdr-Rao  bound  (CRB)  for  this  parameter. 
In  particular,  the  upper  left  entry  of  the  inverse  of  the  right  hand 
side  of  (22)  is 

Ri  + 1  E{s2j} 

(Ri  +  l)(Rj+l)-l  £{a|} 

and  equals  (within  a  factor  T)  the  lowest  variance  E£fj  asymptot¬ 
ically  achievable  by  an  unbiased  estimate  of  the  relative  parame- 
ter  £. 

Note  that  if  an  estimate  A  is  parameterized  as  A(I  +  £)  then, 
at  first  order.  A"1  A  =  I  -  £.  Therefore  the  CRB  on  the  rela¬ 
tive  parameter  £  is  directly  related  to  the  rejection  rates  defined  at 


(14).  In  particular,  using  (23),  we  find  that  the  best  asymptotically 
achievable  rejection  rates  are 


_  Ri+1 
ptJ  RiRj+Ri+Rj' 


(24) 


3.6.  Non  stationarity  and  non  Gaussianity 

It  is  important  to  note  that  the  bound  (24)  is  obtained  without  as¬ 
suming  neither  that  the  source  signals  are  Gaussian  nor  that  they 
are  stationary.  Therefore,  expression  (24)  gives  a  unifying  answer 
to  the  issue  of  finding  how  the  non  stationarity  and  the  non  Gaus¬ 
sianity  jointly  govern  the  achievable  performance  of  source  sepa¬ 
ration. 

Since  Ri  >  0  (see  eq.  (30)  below),  it  is  more  instructive  to 
rewrite  (24)  as  a  signal-to-interference  ratio: 


—  =Ri  + 

Pij 


R> 

Rj  +  1 


(25) 


because  this  last  expression  makes  it  clear  that  good  performance 
depends  on  having  the  highest  possible  values  for  R  and  Rj .  Con¬ 
versely,  performance  is  at  its  worst  (and  blind  separation  in  the 
t.i.d.  case  becomes  impossible)  for  a  given  pair  ( i ,  j)  of  sources 
when  Ri  =  Rj  =  0.  It  is  thus  important  to  understand  the  mean¬ 
ing  of  the  Ri  moments. 

Non  Gaussianity  index.  Consider  the  following  moment  of  Si(t ): 

7«=^m(*(0)  £«?(*)-!•  <26> 


The  scalar  7 u  is  non  negative:  7 a  >  0  with  equality  only  when 
Si(t)  has  a  Gaussian  distribution.  This  is  easily  seen  using  the 
Cauchy-Schwartz  inequality  and  the  fact  that  E<f>ti(si(t))si(t)  = 
1.  Thus  7 ti  is  a  measure  of  the  non  Gaussianity  of  the  variable 
Si(t).  We  define  for  the  i-th  source  an  ‘average’  non  Gaussianity 


index: 


Q<~  e{*;2} 


(27) 


as  the  average  over  time  of  the  non  Gaussianity  7 ;;  of  s;  (t )  weighted 
by  the  reciprocal  variance  erf~2. 


Non  stationarity  index.  We  defined  a  non  stationarity  index  for 
the  i-th  source  by 


Pi  =  {E  {<r2ti})  (E  fa2})  -  1.  (28) 


An  alternative  expression  for  this  index  is 


Pi  =  E 


(29) 


which  shows  that  R  >  0  with  the  equality  case  being  of;  =  of;. 
Thus,  0i  does  work  as  a  measure  of  non  stationarity  or,  more  ac¬ 
curately,  as  a  measure  of  second-order  stationarity. 

Non  stationarity  and  non  stationarity.  The  moment  Ri  which 
represents  the  combined  effects  of  non  stationarity  and  non  Gaus¬ 
sianity  can  be  rewritten,  at  the  cost  of  a  few  manipulations,  as  a 
function  of  a,  and  /?;.  We  find 


Ri  =  a;  +  Pi  +  aiPi  (30) 


In  the  case  of  stationary  non  Gaussian  sources,  we  have  7 ;;  —  0 
so  that  Pi  =0  and  R:  reduces  to  Ri  =  on  while  a;  itself  reduces 
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to  a,  =  Efaj  (si)Esj ;  the  results  of  Pham  [7]  for  the  stationary 
case  are  recovered  with  the  usual  measure  of  non  Gaussianity. 

In  the  case  of  non  stationary  Gaussian  sources,  the  score  func¬ 
tion  is  given  by  eq.  (7).  Then  7 H  =  0  and  thus  a,  =  0  so  that 
Ri  reduces  to  FL  =  R  showing  that  it  is  the  particular  way  in 
which  (3t  measures  the  deviation  of  the  variance  sequence  from 
being  constant  which  quantifies  the  potential  of  non  stationarity 
for  blind  separation. 

Also  note  that  when  the  sources  are  weakly  non  Gaussian  (m  <C 
1)  and  weakly  non  stationary  (R  <  1),  then  R,  a  a,  +  R,  i.e. 
the  benefits  of  non  Gaussianity  and  non  stationarity  just  add  up. 
On  the  opposite  side,  for  sources  which  are  strongly  non  station¬ 
ary  and  non  Gaussian,  we  have  Ri  «  a,/?;,  i.e.  the  benefits  of 
non  Gaussianity  and  non  stationarity  multiply  each  other  to  a  large 
value  of  Ri. 

4.  ALGORITHMS 

We  briefly  outline  algorithms  for  the  separation  of  Gaussian  non 
stationary  sources  derived  from  the  maximum  likelihood  principle. 
More  details  about  the  algorithms  can  be  found  in  [5] 

4.1.  Gradient  and  Newton-like  algorithms 

Relative  gradient  algorithm.  The  relative  gradient  algorithm  for 
maximizing  the  likelihood  of  a  separating  matrix  B  =  A~x  in  the 
non  stationary  case  is  a  direct  generalization  of  the  i.i.d.  case: 

1 .  Initialize  :  B  =  I  (for  instance). 

2.  Compute  the  relative  gradient  G  of  the  log-likelihood: 

G(B)  =  kY,My(t))y(t)'-i  (3i) 

t=i 

3.  If  matrix  G(B)  is  small  enough,  stop;  otherwise,  update  the 
separating  matrix  B: 

B  i-  (I  —  pG(B))B  (32) 

and  go  to  step  2. 

In  practical  cases,  one  cannot  expect  to  know  in  advance  the  time- 
varying  distributions  of  the  sources:  a  ML  algorithm  cannot  be 
directly  implemented  as  summarized  by  eqs.  (31-32).  In  the  sta¬ 
tionary  case,  an  option  is  to  use  prior  estimates  for  the  non  linear 
score  functions  fa  or  to  estimate  them  from  the  data  using  Pham’s 
method  [7],  In  the  non  stationary  case,  there  is  another  option 
because  one  does  not  need  to  exploit  the  non  Gaussianity:  it  is 
sufficient  to  rely  on  the  non  stationarity.  Therefore,  one  may  use 
the  Gaussian  score  functions  of  eq.  (7).  These  score  functions  de¬ 
pend  on  a  single  parameter:  the  variance  of  the  given  source  at  the 
given  time  instant.  Of  course,  this  is  still  as  many  parameters  as 
data  points  so  there  is  no  way  the  instantaneous  variances  can  be 
estimated  without  additional  assumptions.  This  suggests  a  class  of 
algorithms  where  each  iteration  of  the  algorithm  (31-32)  is  inter¬ 
twined  with  an  estimation  of  the  source  variances  from  the  current 
estimates  y  (t)  =  Bx(t)  of  the  sources. 

There  are  as  many  algorithms  as  models  and  estimation  tech¬ 
niques  for  the  variances.  The  most  natural  and  generic  approach  is 
to  assume  some  smoothness  in  the  temporal  evolution  of  the  vari¬ 
ances  of  each  source.  The  simplest  idea  then  is  to  estimate  the 
variances  er^  by  low-pass  filtering  the  squared  outputs  of  the 
separating  matrix. 


A  Newton-like  algorithm.  Even  though  the  relative  gradient  al¬ 
gorithm  outlined  above  performs  reasonably  well,  there  is  room 
for  simple  improvements  by  developing  an  approximate  on  line 
Newton-like  technique.  The  starting  point  is  an  exponentially  weighted 
relative  gradient: 

GW  =  X^(l  ~  {'My(r))y(r)t  -  J  j  .  (33) 

T<£ 

Assume  that  G{t  -  1)  =  0  for  some  B{t  -  1)  and  look  for  a 
relative  update,  that  is,  B(t)  =  (I  -  pH{t))B{t  -  1).  A  first 
order  (in  p)  expansion  of  the  equation  G(t)  =  0  and  some  simpli¬ 
fying  assumptions  allow  to  derive  a  scale-invariant  adaptive  algo¬ 
rithm  which  converges  much  faster  than  a  ‘regular’  relative  gradi¬ 
ent  technique  at  very  small  additional  cost. 


4.2.  Joint  diagonalization  algorithms 

Here,  we  consider  a  different  model  of  non  stationarity:  rather  than 
assuming  a  smooth  temporal  variation  of  the  variance  profiles,  we 
assume  that  data  set  can  be  divided  in  L  blocks  with  the  variance 
of  each  source  being  constant  over  each  block.  It  must  be  stressed 
that  we  do  not  really  need  this  model  to  hold  to  get  consistent  esti¬ 
mates:  it  suffices  that  it  captures  enough  of  the  non  stationarities. 

Under  this  ‘piecewise  stationary  model’,  the  normalized  log 
likelihood  takes  the  form 

1  J 

-yVlogp(x(l),...,x(T)|A)  =  ^tu/off  {BRiB*)  (34) 

1=1 

where  =  means  ‘equal  up  to  a  constant  term’,  where  B  =  A -1, 
where  R  denotes  the  sample  covariance  matrix  of  the  observations 
estimated  over  the  Z-th  block,  where  wi  is  the  proportion  of  data 
points  belonging  to  the  Z-th  block  and  where  off  (R)  is  a  measure 
of  diagonality  of  a  symmetric  positive  matrix  R  defined  as 

off  (/?)  =  log  det  R  —  log  det  diag  R  (35) 

with  diag/?  denoting  the  diagonal  matrix  with  the  same  diagonal 
as  R. 

Thus,  in  the  piecewise  stationary  model,  the  ML  principle 
boils  down  to  estimating  A  as  the  matrix  whose  inverse  jointly 
diagonalizes  the  sample  covariances  over  each  block  (only  an  ap¬ 
proximate  joint  diagonalization  in  the  weighted  sense  of  eq.  (34) 
is  possible,  of  course). 

Based  on  this  principle,  one  could  implement  a  two-step  pro¬ 
cedure:  a  first  whitening  step  which  turns  the  mixing  matrix  A  into 
an  (approximately)  orthogonal  matrix  followed  by  a  second  step  of 
joint  approximate  diagonalization  of  the  covariance  matrices  of  the 
whitened  data  by  an  orthogonal  matrix.  This  technique  would  be 
the  non  stationary  counterpart  of  [1]  (which  uses  a  spectral  con¬ 
trast  as  opposed  to  the  current  non-stationary  contrast). 

However  it  is  possible  to  implement  a  better  solution  because  it 
exists  an  efficient  algorithm  for  the  joint  diagonalization  of  several 
symmetric  positive  matrices.  This  algorithm  minimizes  exactly  the 
objective  (34)  and  does  so  over  all  invertible  matrices.  Thus  it  does 
not  require  pre-whitening  and  computes  exactly  the  ML  estimate 
of  the  piecewise  stationary  model.  More  details  are  in  [5]. 
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5.  CONCLUSION 


We  have  investigated  the  achievable  blind  source  separation  per¬ 
formance  in  a  simple  model  of  non  stationary  sources.  In  this 
model,  blind  separation  is  made  possible  by  non  stationarity  and/or 
non  Gaussianity.  The  performance  is  summarized  by  the  rejection 
rates  pij  which  under  a  natural  simplifying  assumption  depend  on 
the  moments  Ri.  Better  performance  is  obtained  for  larger  Ri. 
These  moments  in  turn  are  simple  increasing  functions  of  a  non 
Gaussianity  index  cr*  and  a  non  stationarity  index  f3, .  Several  algo¬ 
rithms  for  the  separation  of  Gaussian  non  stationary  sources  have 
been  outlined;  the  analysis  of  their  performance  will  be  the  subject 
of  further  research. 
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ABSTRACT 

In  Blind  Source  Separation  (BSS)  no  prior  knowledge  is 
considered.  However,  due  to  inherent  indeterminations  of  the 
problem,  some  arbitrary  normalizing  conditions  are  imposed  on 
the  sources  or  on  the  recovering  matrix.  We  present  a  related 
problem:  when  we  have  some  prior  information  about  any  of  the 
elements  of  the  mixing  matrix,  and  how  traditional  solutions  can 
be  modified  incorporating  this  information  to  obtain  new 
estimators.1 

1.  INTRODUCTION 

Blind  Source  Separation  (BSS)  for  instantaneous  and  noiseless 
mixture  consists  on  recovering  some  statistically  independent 
signals  named  source  signals  starting  from  linear  mixtures  of 
them.  Recently  a  lot  of  papers  in  this  area  of  signal  processing 
have  been  published.  This  paper  presents  a  modified  problem 
related  to  BSS. 

In  BSS  nothing  is  normally  supposed  about  mixing  matrix.  If  any 
prior  knowledge  is  included  in  the  statement  of  the  problem,  this 
is  referred  to  the  sources.  For  example,  in  [1]  sources  are 
temporally  correlated,  in  [2],  the  probability  density  function 
(pdf)  of  the  sources  are  known,  and,  in  [3],  sources  are  discrete. 

With  the  aim  of  avoiding  the  inherent  indetermination  associated 
to  the  BSS  problem  (we  do  not  know  the  order  of  the  signals  and 
their  amplitudes)  in  some  solutions  some  conditions  are  imposed 
to  the  mixing  matrix  or  to  the  recovering  matrix.  In  [4],  the  value 
of  the  elements  of  the  diagonal  are  one,  and  we  have  to  obtain 
only  the  off-diagonal  elements.  In  [5],  the  modulus  of  the 
columns  of  the  recovering  matrix  is  one.  Normally,  in  the 
solutions  the  indetermination  is  eliminated  imposing  restrictions 
in  the  statistical  properties  of  the  sources,  as  for  example, 
assuming  that  the  sources  have  unit  variance. 

In  this  paper  the  statement  of  the  BSS  problem  is  changed.  We 
will  introduce  a  new  information  in  the  problem:  we  will  suppose 
that  a  prior  information  about  some  of  the  elements  of  the  mixing 
matrix  is  given. 

In  Section  2  we  will  define  how  this  information  related  to  the 
elements  of  the  mixing  matrix  can  be  mathematically  formulated 


1  This  paper  is  supported  by  Spanish  Education  Ministry  under 
contract  FEDER-CICYT  1230. 


from  a  statistical  point  of  view  and  the  consequences  in 
traditional  hypothesis  of  BSS. 

In  Section  3  we  will  obtain,  starting  from  traditional  BSS 
algorithms,  a  new  modified  class  of  them  and  show  the  explicit 
form  of  some  of  these  modified  algorithms.  Finally  some  results 
of  applying  them  are  shown  in  Section  4. 

For  simplicity  we  will  only  suppose  the  real  instantaneous  case, 
although  more  examples  are  found  in  convolutive  mixture 
(knowledge  of  some  of  the  filters  or,  at  least,  kind  of  filters  and 
some  coefficients  of  them,  that  relate  the  sources  with  the 
observed  signals. 


2.  PRIOR  INFORMATION  ABOUT 
MIXING  MATRIX 

There  are  two  kinds  of  prior  information  about  mixing  matrix 
elements. 

•  Deterministic  case:  one  or  more  elements  of  the  mixing 
matrix  are  known.  In  this  simple  case,  these  elements  are 
included  directly  in  the  solution  (recovering  matrix).  We 
can  find  a  simple  example  in  convolutive  sound  mixtures; 
the  microphones  that  record  the  sound  are  normally  close  to 
the  sources,  so  the  elements  of  the  diagonal  are  usually 
considered  to  be  one. 

•  Not  deterministic  case:  we  have  a  prior  information  about 
some  elements  but  we  do  not  know  exactly  their  value.  This 
degree  of  uncertainty  can  be  statistically  modeled  by  a 
(pdf).  If  this  pdf  of  the  mixing  matrix  elements  is  correct 
and  meaningful,  it  is  clear  that  it  could  be  included  in  BSS 
formulation  achieving  better  results.  It  must  be  clear  that  we 
are  interested  in  maintaining  the  BSS  statement  of  the 
problem,  including  a  Bayessian  perspective  by  considering 
that  the  pdf  is  a  prior  pdf  of  the  mixing  matrix  elements,  not 
like  in  traditional  array  signal  processing  where  the  matrix 
is  more  restricted. 

We  will  call  our  problem  AKICA  (A  priori  Knowledge 
Independent  Component  Analysis)  in  order  to  clarify  the  notation 
and  nomenclature,  and  to  distinguish  it  from  BSS  or  ICA 
problem. 
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The  AKICA,  considered  as  an  extension  of  BSS  is 
mathematically  formulated  for  the  real,  instantaneous,  noiseless 
2x2  case  as: 

y  =  As  (1) 

where  y  is  the  2x1  observed  signals  vector,  S  is  the  2x1  source 
signals  vector  whose  components  are  statistically  independent 
and  A  is  the  mixing  matrix  2x2  with  an  associated 
matrix  p(  A)  whose  element  (i,j)  is  pia^ ) ,  the  pdf  of  the 

element  atj  of  the  mixing  matrix  that  represents  the  prior 
information. 

This  definition  includes  the  deterministic  case;  if  any  of  the 
elements  of  the  mixing  matrix  is  known,  it  can  be  considered  as  a 
random  variable  with  a  pdf  expressed  as  a  delta  function 
allocated  in  the  correct  value.  On  the  other  side,  if  there  is  not 
any  information  about  atj,  a  uniform  pdf  will  be  used. 

Our  assumptions  about  the  sources  will  be  the  same  as  in  BSS; 
statistical  independence,  no  more  than  one  Gaussian  source,  and, 
in  order  to  eliminate  the  indetermination  about  the  amplitude, 
unit  variance. 


3.  MODIFIED  ALGORITHMS 
INCLUDING  PRIOR  INFORMATION 

Traditional  solutions  are  based  on  the  minimization- 
maximization  of  a  function  that  measures  the  statistical 
independence  of  the  recovered  signals.  As  we  do  not  know 
anything  about  sources  (their  pdf  is  unknown),  a  pure  statistical 
analysis  of  the  problem  is  difficult.  In  order  to  approximate  the 
statistical  independence  of  the  sources,  many  approaches  have 
been  developed  for  BSS:  [5]  maximizes  constrast  functions 
derived  from  mutual  information,  [6]  presents  a  algebraic 
solution  based  on  joint  diagonalisation  of  fourth  order  cumulant 
matrices,  [7]  is  based  on  information  theory... 

We  will  include  our  prior  information  modifying  the  contrast 
functions  that  somehow  measure  the  independence  of  the 
recovered  signals.  Most  of  BSS  solutions  employ  a  two-step 
method;  first,  the  observed  signals  are  decorrelated  and 
normalized, 


u  =  L_1y,  £{uur}  =  I  (2) 


and  second  the  orthogonal  matrix  (a  Givens  rotation  matrix 
function  of  the  rotation  angle)  is  calculated: 


1 
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s  =  Qru 


(3) 


The  first  step  is  well  studied  in  the  bibliography  (PCA  analysis), 
so  we  will  focus  in  the  estimation  of  the  Givens  matrix  angle.  In 
[8]  we  present  a  more  general  and  theoretical  approach  including 
the  influence  of  the  whitening  step  on  the  prior  information 
matrix.  In  this  paper  we  will  focus  on  the  results  section. 


We  will  show  two  examples  based  on  two  of  the  most  important 
algorithms  in  BSS:  EASI,  an  iterative  solution  [9],  and  Comon’s 
solution  as  a  batch-type  algorithm  [5]. 

Cardoso’s  algorithm  EASI  is  based  on  the  minimization  of  the 
contrast  function 

/(0)  =  £islS/l4  W 

for  normalized  decorrelated  negative  kurtosis  sources,  where  0  is 
the  rotation  angle  defining  the  Givens  rotation  matrix.  The 
modified  EASI  minimizes  (with  the  same  hypothesis)  the 
function 


<t>(9)  =  (£lsi  I4  +£ls2  l4H-/>(9))  (5) 

Applying  a  gradient  method  we  obtain  the  new  adaptation  rule.  It 
consists  on  a  term  like  the  EASI  solution  weighted  by  the  value 
of  the  pdf  for  the  estimated  angle  and  a  new  term  that  tries  to 
minimize  -p(0)  (or  maximize  p(0)). 

In  Comon’s  solution  the  simplified  contrast  function  maximizes 
the  sum  of  the  squared  marginal  cumulants  of  the  recovered 
signals 

T(Q)  =  Xr«.,  (6) 

/=! 


where  T  are  the  cumulants  of  s  (normally  cumulants  of  order  4 
are  considered).  Our  new  estimator  will  maximize 


*(Q)  =  irL  •*><&)  <7> 

i=l 

In  all  these  new  algorithms  the  most  important  role  is  played  by 
the  prior  pdf.  If  it  is  not  correct,  probably  our  estimator  will  be 
biased.  If  the  pdf  is  meaningful,  the  new  estimator  will  have  less 
variance. 


However,  there  are  BSS  solutions  that  are  not  obtained  starting 
from  an  objective  (contrast)  function.  These  algorithms  are  tests 
of  statistical  independence  of  the  recovered  signals  in 
convergence.  As  an  example,  we  consider  Jutten-Herault 
learning  rule  [4].  In  this  case,  our  prior  knowledge  can  be 
directly  introduced  adding  a  term  that  tries  to  adjust  the  solution 
in  order  to  maximize  the  prior  probability.  The  modified  Jutten- 
Herault  algorithm  is: 


Cjj[n  + 1]  =  c,y[n]+  a ■  /(*,-)  •  g(Sj)  +  p •  ^ 

OCfj 


(8) 
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where  the  elements  of  the  recovering  matrix  are  c,y,  a,0  are 
learning  steps,/and  g  different  odd  functions.  In  convergence. 


E^fJ+l  ■  s2km+l }+ 


9 p(Cij) 

dcU 


=  0 


(9) 


In  (9)  is  clear  that  an  accurate  mathematical  model  of  the  prior 
knowledge  is  necessary  if  we  want  to  obtain  independent 
recovered  signals. 
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4.  RESULTS 


Comon  and  n@w  solution 


Example  1.  EASI  and  modified  EASI  algorithms. 

Source  1  is  a  sinusoidal  function  and  source  2  a  sawtooth 
function,  both  of  them  clearly  subgaussian.  They  are  mixed  by  a 
Givens  matrix  with  0  =  0.2  . 

In  figure  1  we  show  the  estimated  angle  for  the  EASI  and 
modified  EASI  algorithms,  both  with  the  same  initial  condition 
and  the  same  adaptation  coefficient. 

The  prior  information  corresponds  to  a  gaussian  pdf,  with  mean 
0.2  and  unit  variance.  As  we  can  see  in  this  figure,  both  of  them 
converge  to  the  correct  solution  in  mean,  but  the  variance  of  the 
EASI  solution  is  greater  than  in  the  new  algorithm.  If  we  want  to 
reduce  the  variance  we  can  reduce  the  adaptation  step,  but  the 
convergence  slows,  so  we  need  more  samples  to  obtain  the 
correct  angle. 


MODIFIED  EASI  SOLUTION 


Example  2.  Comon  and  modified  Comon  algorithms. 

It  represents  a  mixture  of  two  different  sinusoidal  signals.  In  this 
case  6  =  -0.5  . 

Figure  2  shows  the  estimated  Gin  front  of  the  number  of 
observations;  in  dotted  line  the  new  estimator,  and  in  solid  line 
Comon’s  solution.  The  new  algorithm  converges  faster  than 
classical  one.  The  prior  pdf  is  a  Gaussian  with  mean  -0.5  and 
unit  variance. 

Figure  3  compares  the  variance  of  both  estimators.  In  this  case, 
the  modified  estimator  has  less  variance  than  the  other  one,  and 
for  60  or  more  observations,  both  of  them  achieve  the  correct 
solution  with  a  low  variance.  We  can  see  that  modified  algorithm 
only  need  40  samples  to  obtain  an  unbiased  and  low  variance 
estimator  for  this  example. 


Figure  2.  Estimated  angle  vs  number  of  observations. 
In  solid  line  Comon’s  solution  and  in  dotted  line  the 
modified  Comon’s  solution. 


Variance  Comon  and  modified  solutions 


modified  Comon  (dotted  line)  estimators  vs  number  of 
observations. 

Example  3.  Influence  of  prior  knowledge  (variance). 

Correct  angle  is  &  =  0.5  and  p(9)  represents  a  Gaussian  r.v. 

with  mean  0.5  and  variable  variance.  Only  30  samples  are 
considered. 

In  figure  4  we  observe  that  Comon’s  algorithm  needs  more 
observations  to  obtain  the  correct  solution  and  to  decrease  the 
variance,  while  modified  solution,  for  a  prior  knowledge  with 
low  variance,  estimates  the  angle  appropriately.  However,  when 
our  prior  knowledge  is  not  significant  (high  variance),  both  of 
them  are  the  same  algorithms,  and  the  solutions  are  similar. 
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estimated  angle 


Figure  4.  Estimated  angle  (up)  and  variance  of  the 
estimator  (down)  of  Comon  (solid  line)  and  modified 
Comon  (dotted  line)  estimators  vs  variance  of  prior 
unbiased  pdf. 


Example  4.  Influence  of  prior  knowledge  (mean). 

The  angle  is  0.3  and  prior  pdf  is  modeled  by  a  Gaussian  r.v.  with 
variable  mean  and  variance  0.6.  The  number  of  observations  is 
50.  In  figure  5  we  show  how  if  prior  knowledge  is  biased  and 
variance  is  low,  the  angle  estimated  by  modified  algorithm  is 
biased,  not  Comon’ s  solution.  However,  the  variance  of  modified 
algorithm  is  lower  than  Comon’s  algorithm,  so  for  incorrect  prior 
information  (mean  far  away  from  the  correct  mixing  angle)  with 
low  variance,  mean  squared  error  is  similar  to  traditional 
estimator. 

Example  5.  Jutten-Herault  and  modified  algorithms. 

Source  signals  are  the  same  that  in  example  1.  The  value  of  the 
coefficient  cn  is  0.3.  A  Gaussian  r.v.  models  our  prior 
information  (mean  is  0.3  and  variance  is  variable).  In  figure  6 
we  show  the  recovered  coefficient.  In  the  unit-variance  case  only 
50  samples  are  needed  to  estimate  properly  the  mixing 
coefficient. 
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estimated  angle 


Figure  5.  Estimated  angle  (up)  and  variance  of  the 
estimator  (down)  of  Comon  (solid  line)  and  modified 
Comon  (dotted  line)  estimators  vs  mean  of  prior  0.6 
variance  pdf,  for  50  observations. 


Figure  6.  Recovered  ct2  coefficient  vs  number  of 
observations  for  Jutten-Herault  algorithm  (solid  line) 
and  an  unbiased  prior  information  with  variable  variance 
(dotted  line,  variance  is  one;  dash-dotted  line,  variance 
is  two). 
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ABSTRACT 

We  present  a  quasi-maximum  likelihood  approach  to  blind 
source  separation  (BSS)  which  is  based  on  approximating 
the  source  distributions  by  their  truncated  Edgeworth  ex¬ 
pansions.  The  paper  focuses  on  the  2x2  case,  for  which  the 
problem  is  known  to  reduce  to  the  estimation  of  a  single  ro¬ 
tation  angle.  Unlike  existing  maximum  likelihood  BSS  tech¬ 
niques,  the  proposed  algorithm  is  consistent  for  any  source 
distribution,  provided  that  the  usual  identifiability  condi¬ 
tion  (at  most  one  Gaussian  source)  is  satisfied.  Closed-form 
expressions  are  derived  for  the  true  CRB,  for  the  CRB  cor¬ 
responding  to  the  Edgeworth  approximation,  and  for  the 
large-sample  variance  of  the  proposed  estimator.  The  pro¬ 
posed  algorithm  is  compared  with  existing  approaches  via 
extensive  simulations. 

1.  INTRODUCTION 

Blind  signal  estimation  aims  to  estimate  unknown  source 
signals  from  distorted  and  noise-contaminated  observations 
without  explicit  knowledge  of  the  transmission  channel.  In 
the  case  of  multiple  sources  transmitted  simultaneously, 
blind  signal  estimation  requires,  in  general,  multiple  sensors 
or  antennas  (or  other  forms  of  diversity,  such  as  time,  fre¬ 
quency,  code,  etc.).  Consider  the  scenario  where  L  source 
signals  are  impinging  on  a  set  of  M  sensors.  The  trans¬ 
fer  function  of  the  antenna  array  is  assumed  to  be  linear 
and  memoryless.  The  mixing  matrix  may  be  unknown 
due  to  several  reasons:  sensor  locations  are  unknown  (or 
not  exactly  known),  or  the  array  is  uncalibrated;  angular 
spread  (due  to  local  scattering,  source  signals  impinge  on 
the  array  at  different  angles  but  with  no  appreciable  delay) 
also  causes  the  array  matrix  to  lose  its  usual  Vandermonde 
structure.  Thus,  the  outputs  of  the  antenna  array  are  mod¬ 
elled  as 

x(t)  =  As(t)  -f  v(t)  (1) 

where  A  =  [aij]  is  the  (M  x  L)  mixing  matrix  characterizing 
the  antenna  array,  sm  is  the  ( L  x  1)  source  signal  vector, 
v(t)  is  the  (M  x  1)  additive  noise  vector,  and  t  is  the  time 
index.  The  objective  is  to  blindly  recover  or  estimate  the 
source  vector,  s (t).  A  less  demanding  objective  is  to  recover 
s(t)  up  to  scale  factors  and  permutation  ambiguities.  BSS 
consists  of  finding  a  separating  matrix  B  such  that  BA  is  a 
non-mixing  matrix  (see  [4]),  i.e.,  BA  =  P  where  P  is  a  gen¬ 
eralized  permutation  matrix  (generalized  in  the  sense  that 
the  non-zero  entries  are  not  constrained  to  be  unity).  To 
ensure  that  the  spatial  signatures  of  the  signals  incident  on 
the  array  are  distinct,  the  mixing  matrix  must  satisfy  the 
following  assumption. 

(AS1)  A  has  full  column  rank. 

In  general,  BSS  requires  the  following  extra  assumptions: 
fAS2)  Si(t),  i  =  1  are  mutually  independent. 

(AS3)  s;(t)  are  zero-mean  stationary  and  mixing. 

(AS4)  v(f)  is  an  M-variate  zero-mean  stationary  mixing 
process  and  is  independent  of  the  source  signals. 

(AS5)  At  most  one  of  the  sources  is  Gaussian.  More  pre¬ 
cisely,  we  assume  that  |«4,i|  +  |m,2|  ^  0;  see  eq.  (4). 

BSS  techniques  can  be  categorized  into  two  classes:  i) 
statistical  distribution-based  techniques  [2]  [4][7]  which  ex¬ 
ploit  a  priori  information  about  the  statistical  or  deter- 
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ministic  distribution  of  the  source  vector,  such  as,  finite 
alphabet,  non-Gaussianity,  constant  modulus,  known  skew¬ 
ness,  kurtosis  or  other  statistics;  it)  temporal-correlation- 
based  techniques  [10]  [1]  under  the  assumption  that  the 
source  signals  have  distinct  (not  necessarily  orthogonal) 
time-frequency  signatures.  The  BSS  problem  has  attracted 
increasing  attention  because  of  its  wide  range  of  appli¬ 
cations,  such  as  biomedical  engineering,  radar  processing, 
acoustic  tracking  and  mobile  communications. 

A  standard  approach  to  BSS  is  to  first  spatially  pre¬ 
whiten  the  data  using  second-order  statistics;  this  serves  to 
normalize  the  signal  powers  as  well,  and  reduces  the  mix¬ 
ing  matrix  to  an  unitary  matrix.  In  the  second  step,  this 
unitary  matrix  is  estimated.  Here,  we  focus  on  the  noise¬ 
less  (high  SNR)  case  and  assume  that  the  pre-whitening 
step  has  been  carried  out  perfectly.  The  discrete-time  data 
vector  is  now  given  by 

z(t)  =  Us(t),  t=  1,2,...  (2) 

where  U  is  the  unknown  unitary  or  rotation  matrix.  The 
covariance  matrix  of  z(t)  is  the  identity  matrix.  We  also 
focus  on  the  basic  scenario  of  two  sources-two  sensors.  As 
explained  in  [4],  this  scenario  can  be  used  to  solve  the  gen¬ 
eral  problem  by  operating  pairwise  over  several  sweeps  until 
convergence.  The  unknown  (2  X  2)  unitary  transformation 
matrix  is  reduced  to  a  Givens  rotation  matrix,  which  can 
be  expressed  in  the  case  of  real-valued  signals  as 

«('•>=($#  p» 

where  90  is  the  rotation  angle  between  the  two  source  sig¬ 
nals.  The  source  vector  s(t)  can  be  unambiguously  esti¬ 
mated  if  60  can  be  unambiguously  identified.  However,  in 
order  for  BSS  to  be  achieved,  only  [(?0]„./4  needs  to  be  iden¬ 
tified,  where  [90]a  denotes  the  contracted  value  of  6  in  the 
interval  [—a, a).  Indeed,  U([6o\n/i)U(90)  is  a  non-mixing 
matrix.  With  [<?<,],, ,  the  signal  vector  s(t)  can  be  identified 
only  up  to  sign  and  permutation  ambiguities.  The  permu¬ 
tation  ambiguity  can  be  relieved  if  \9o\ni2  can  be  identified, 
which  requires  the  source  signals  to  have  different  distri¬ 
butions.  The  sign  ambiguity  can  be  relieved  if  the  source 
distributions  are  non-symmetric. 

An  approximate  maximum  likelihood  BSS  was  proposed 
in  [6]  where  the  Gram-Charlier  expansion  was  used  to  ap¬ 
proximate  the  source  distributions.  The  algorithm  pro¬ 
posed  in  [6]  is  limited  to  the  case  where  the  sum  of  the 
source  kurtoses  is  positive,  i.e.,  «4,i  +  k4i2  >  0,  where 

««.<  =  E(4(t))/[E(s? (t))]2  -  3  ,  (4) 

is  the  kurtosis  of  the  ith  source.  An  extension  of  this  al¬ 
gorithm  to  include  the  k4>i  +  k4i2  <  0  case  was  suggested 
in  [11]  using  a  geometrical  approach.  However,  both  of 
these  methods  fail  when  k4,i  +  k4, 2  is  close  to  zero.  The 
closed-form  BSS  solution  proposed  by  Comon  [3,  4],  which 
is  based  on  the  concept  of  independent  component  analysis 
(ICA)  also  fails  in  this  scenario.  In  this  paper,  we  general¬ 
ize  these  methods  in  order  to  overcome  the  above  problem. 
The  proposed  algorithms  are  consistent  for  any  source  dis¬ 
tributions  provided  that  the  identifiability  condition  (which 
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is:  at  most  one  of  the  sources  is  Gaussian  [10][4];  see  (AS5)) 
is  satisfied. 

The  scenario  where  «4,i  +  K4,2  is  close  to  zero  can  be  en¬ 
countered  when  an  impulsive  signal  interferes  with  a  com¬ 
munication  signal.  Indeed,  the  kurtosis  of  the  latter  is 
negative- valued  whereas  that  of  an  impulsive  interference 
is  positive-valued.  The  proposed  algorithm  will  be  shown 
to  be  robust  in  this  scenario,  and  yet  its  performance  in  the 
absence  of  impulsive  interference  is  comparable  with  that 
of  existing  BSS  techniques. 

We  derive  a  closed-form  expressions  for:  (1)  the  exact 
CRB;  (2)  the  CRB  corresponding  to  the  truncated  Gram- 
Charlier  series;  and  (3)  the  large  sample  variance  of  our 
estimator.  These  expressions  are  also  applicable  to  the  es¬ 
timates  of  the  rotation  angle  of  a  complex  symbol  set  with 
independent  real  and  imaginary  parts. 

2.  AN  APPROXIMATE  MAXIMUM 
LIKELIHOOD  (AML)  APPROACH 

Let  the  normalized  (wrt  the  unknown  scale)  pdf  of  the  ith 
source  be  denoted  by  pS((.).  Since  the  sources  are  inde¬ 
pendent,  and  ignoring  the  whitening  imperfections,  the  ML 
estimate  of  the  rotation  angle  0O  is  obtained  by  maximizing 

l(ZT-,6)  =  £t|^  lnp,i([tf(0)Tz]i)j 

where  Zt  =  [z(l),  ...,z(T)],  z (t)  is  given  by  (2)  and  (3), 
ET{y}  =  1/T  ]Tf=1  y (t)  and  [y]j  denotes  the  ith  entry  of 
vector  y.  The  ML  scheme  may  be  simplified  by  approxi¬ 
mating  the  pdf  of  the  sources  by  their  truncated  Edgeworth 
or  Gram-Charlier  expansions  [8],  which  involve  cumulants. 
Both  of  these  expansions  are  ways  to  express  a  pdf  by  an 
expansion  around  the  Gaussian  kernel. 

2.1.  Approximate  Likelihood  Function 

The  Edgeworth  expansion  of  the  pdf  of  Si  about  its  best 

Gaussian  approximate  is  [8] 

Pn  =5(s<)[1  +  ^i(s03 

where  g(.)  is  the  normalized  Gaussian  distribution,  and 

<5<Pi(s)  =  ^K3,<Ms)  +  ;^K4,iMs)  +  •••  • 

In  the  above  expression,  K„,i  is  the  nth-order  cumulant  of 
Si  and  hi(  )  is  the  Hermite  polynomial  of  degree  i.  An 
approximation  of  the  source  pdf  is  obtained  by  truncating 
the  Edgeworth  expansion  as  follows 

Psi(si)  *  g(8i)  [l  +  ■ (4  -  3s,)  +  ^f(4  -  6s?  +  3)]  (5) 

The  same  approximation  is  obtained  via  the  Gram-Charlier 
expansion. 

Using  the  polar  coordinates  of  z,  i.e. , 

P  =  \zi  +  2! | 1/2  ,  <t>  =  Z{z\  +  jz2}  , 

»<•>■’«=(  s it-'i)-  w 

Using  (5)  and  (6),  the  likelihood  function  can  be  approxi¬ 
mated  as  (using  ln(l  +  c)  «  c,  for  |c|  <  1,  and  dropping 
non-relevant.  terms) 

l(ZT;0)  =  ET{h(z;e)}  +  ET{h(z-,0)}  (7) 


where 

3 

h(z;  0)  =  —  [«3,i  cos (<f>  -  9)  +  k3,2  sin(<j>  -  0)] 

O 

3 

"*"24  ^£3’1  cos3^  -  -  K3.2  s'nH4>  -  0)]  > 

h{ z;0)  =  cos[2(0  -  0)] 

+  K4’11+K4|2P4cos[4(^-0)]. 

2.2.  Estimation 

The  AML  estimator  of  0O  is  obtained  by  maximizing 
Et{1(Zt',  0)}-  We  have  that 

0O  =  argmax£{is(z;0)}  , 

0O  =  argmaxE{Z4(z;0)}  . 

The  estimation  of  0O  using  Z3 (z;  0)  is  feasible  only  if  at  least 
one  of  the  two  sources  has  non-zero  skewness,  which  may 
be  restrictive  in  practice.  But  0O  can  be  consistently  esti¬ 
mated  by  maximizing  Et{U(z\S)}  regardless  of  the  pdf  of 
the  source  signals;  we  stress  that  this  holds  true  without 
invoking  the  assumption  of  symmetric  pdfs,  as  in  [6]. 

Notice  that  I4  (z;  0)  consists  of  two  terms,  one  is  a  function 
of  the  difference  of  the  source  kurtoses  and  the  other  is  a 
function  of  the  sum  of  these  kurtoses.  Let 

*r(z;0)  =  ^—^■2p4cos[2^-0)]  , 

Z+(z;0)  =  cos[4(<ft  -  0)]  ■ 

Interestingly,  these  functions  are  also  ‘likelihoods’  (or  con¬ 
trasts  [4])  for  the  estimation  of  0O,  i.e., 

0O  =  arg  max  E{1^  (z;  0)}  ,  (8) 

e 

0O  =  arg  max  £’{14  (z;0)}  .  (9) 

Let  0“  and  0+  denote  the  corresponding  estimators.  First 

notice  that  0„  is  consistent  if  «4,i  —  «4,2  /  0,  and  0j  is 
consistent  if  k4,i  +  144,2  #  0.  Since  these  conditions  can¬ 
not  be  violated  simultaneously  under  assumption  (AS5),  at 
least  one  of  these  estimators  is  consistent.  To  derive  these 
estimators,  we  first  notice  that 

I4  (z;0  +  7r)  =  I4  (z;  0)  , 

Z+(z;0  +  tt/2)  =  tf(z;0). 

This  implies  that  the  identifiability  range  is  [-zr/2,  zr/2]  for 
0“  and  [- tt / 4 ,  zr/4]  for  0+.  Therefore,  9„  is  actually  an  esti¬ 
mate  of  [0o],r/2>  and  0j  is  an  estimate  of  [0o]jr/4-  The  iden¬ 
tifiability  range  can  be  further  extended  by  using  /3(z;0). 
if  the  sources  are  not  both  symmetrically  distributed. 

After  some  algebra,  we  obtain  the  following  closed-form 
estimators 

9~  =  iarg{sign(fc4,i -K4,2)(m +i»72)}  j  (1°) 
0+  =  iarg{sign(«4,i +K4,2)(»73 +4h4)}  ,  (11) 

where 

m  =  Et  (p4  cos  2 <f>)  ;  172  =  Et  {p4  sin  2 </>} 

i?3  =  Et  {p4  cos  4cf>}  ;  =  Et  {p4  sin  40}  . 
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Note  that  the  ML  estimator  proposed  in  [6],  i.e., 

arg  {(fi3  +  IV 4)}  /4,  coincides  with  9%  when  *4,1  +  k4>2  >  0 
(in  [6],  the  kurtoses  of  the  sources  were  assumed  positive 
and  equal;  we  note  that  this  assumption  is  not  valid  in  the 
communications  context). 

For  the  above  estimators  to  be  practical,  the  signs  of 
(^4,1  —  ^4,2)  and  (K4,!  +  ^4,2)  must  be  known  or  estimated 
from  the  data.  After  some  calculations,  we  obtain 


(3)  Estimate  the  sum  of  the  source  kurtoses  via 

C  =  Et{(zi  +  z%)2}  —  8 

(4)  Estimate  6  via  eqn.  (14), 

90  =  jatan2(4?7iq2  +  C*?4,  2q?  -  2 >72  +  Cm) 


«4,i  —  «4,2  =  E{p4  cos  2 <f>}  cos  29 o  +  E{p4  sin  2 4>}  sin  2 0o  , 

^4,1  +  K4,2  =  £{p4}  —  8  . 

Hence,  the  latter  can  be  estimated  directly  from  the  data. 
The  former  however  depends  on  the  unknown  rotation  angle 
90.  A  completely  blind  version  of  the  estimator  9f  is  then 
obtained  as 

^  =  Jarg  {sign  (Et{p4} -8^  (jjs +i»?4)}  •  (12) 

This  estimator  was  also  proposed  in  [11]  using  a  geometri¬ 
cal  approach.  Since  9%  fails  when  k4ii  -f  k4|2  is  close  to  0, 
we  need  to  develop  a  general  estimator  that  would  be  con¬ 
sistent  regardless  of  the  source  distributions.  Towards  this 
objective,  we  go  back  to  the  likelihood  Et{U(z;  9)}  in  eq. 
(7)  and  replace  ^4,1  —  K4i2  and  k4ii  +  «4,2  by  their  estimates 


3.  PERFORMANCE  ANALYSIS 

For  simplicity,  we  assume  in  the  rest  of  the  paper  that  the 
source  signals  are  temporally  independent;  such  an  assump¬ 
tion  is  not  required  to  ensure  the  consistency  of  the  algo¬ 
rithms  developed  in  preceding  sections  (it  suffices  to  assume 
that  the  source  signals  are  weakly  mixing  in  the  sense  that 
they  have  summable  cumulants). 

We  have  derived  a  closed-form  expression  for  the  large 
sample  variance  of  the  completely  blind  estimator  given  by 
(12),  which  assumes  K41  +  K42  ^  0.  The  expression  for  the 
case  where  K41  —  K42  0  is  considerably  more  complicated, 

and  will  be  presented  elsewhere. 

The  large-sample  variance  expression  is  given  by 


var(0)  = 


1  Es\  -  2 Es\Es\  +  Esl 
T  («4,1  +  K4,2)2 


(15) 


«4,i  —  «4,2  =  r/i  cos  29  +  r]2  sin  29  , 

*4,1+ *4,2  =  £  :=  .&r{p4}  —  8, 

After  some  tedious  calculations,  the  LLF  is  found  to  be 

EtMZt',9)}  =  ^ 

^[2'h  -  2f?|  +  £173]  cos  40 

+  ~^[4ffi'72  +  C^Jsin^  .  (13) 

Let  9a  denote  the  maximizer  of  Et{U(Zt]  9)}.  Setting  the 
first  derivative  of  the  LLF  in  eq.  (13)  to  0,  the  AML  estimate 
of  90  is  then  obtained  as 

90  =  ^arg{ei  +  je2]  (14) 

where 

ex  =  2 rfi  -  2ril  +  £q3;  e2  =  4i7iq2  +  £q4  . 

Notice  that  after  replacing  K44  —  K4,2  by  its  estimate, 
the  29  terms  in  the  LLF  have  disappeared,  and  the  LLF 
becomes  a  function  of  4 9  only.  This  implies  that  the  above 
AML  estimate  90  is  an  estimate  of  [do]*/ 4.  However,  [ff0]^/2 
can  be  estimated  if  Kqq  —  «4|2  ^  0.  Indeed,  if  the  source 
signals  are  ordered  such  that  K44  >  K4,2,  the  AMLE  of 

[9o]-kI2  is  90,  if  rii  cos  29 0  >  — q2sin20o,  and  is  [ 90  + 
otherwise. 


2.3.  Summary  of  the  AML  Algorithm 

(1)  Compute  the  whitening  matrix,  W;  whiten  the  array 
outputs  and  normalize  their  powers: 


z  =  W-1x  (=  Us) 
(2)  Evaluate  q;,  i  =  1,  ...,4,  via 


m  —  Et{z\  -  Z2}; 
q3  =  Et{z\  -  9>z\z\  +  z4}\ 


t?2  =  2Et{z\z2  +  z\z\) 
rj4  =  4ET{zfz2  -  ziz\ } 


Details  of  the  derivation  are  omitted  due  to  lack  of  space 
and  may  be  found  in  [5]. 

Remarks: 

1.  Derivation  assumes  that  s<(t)!s  are  zero-mean,  unit  vari¬ 
ance  iid  sequences  (valid  for  BSS),  with  +  K42  /  0. 

2.  The  estimator  involves  the  fourth  moments  of  «(.),  but 
the  variance  expression  does  not  depend  upon  the  eighth 
order  moments  of  the  source  signals. 

3.  The  expression  holds  true  even  if  neither  source  pdf  is 
symmetric. 


4.  CRAMER- RAO  BOUNDS 

We  derive  the  CRB  for  9„  when  the  estimator  is  based  on 
the  LLF  in  eq.  (7)  corresponding  to  the  Gram-Charlier  ex¬ 
pansion.  We  also  derive  the  true  CRB  (i.e.,  using  the  exact 
pdf).  Detailed  derivations  may  be  found  in  [5]. 

4.1.  True  or  Exact  CRBs 

We  show  that  : 

(i)  The  CRB  for  90  and  the  CRB  for  the  source  pdf  param¬ 
eters  are  decoupled; 

(ii)  Under  mild  assumptions  on  the  pdfs,  the  true  or  exact 
CRB  is  given  by, 


CRB(90)=±[IP1+IP3-  2]-1 

where  Ip  is  the  Fisher  information  for  location  (FIL)  of  the 
(standardized)  pdf  p(.),  and  is  defined  as 

/CO 

[p'(«)]2/p(s)  ds  . 

-OO 


We  next  consider  a  few  special  cases: 

(a)  For  the  Gaussian  pdf,  Ip  =  1;  hence,  if  both  sources  are 
Gaussian,  9  cannot  be  estimated,  as  is  well  known. 

(b)  Generalized  Gaussian  pdf  with  shape  parameter  a: 

Ip  =  a2, (2  -  l/o), (3  /a)  /  ,  2(l/a)  . 

Note  that  Ip(a\)  +  Ip(a2)  =  2  iff  aq  =  a2  =  2,  i.e.,  the 
Gaussian  case.  If  a  =  1  (Laplace),  we  have  Ip  =  2. 
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4.2.  CRB  for  the  Gram-Charlier  Approximation 
We  show  that 

(i)  Assuming  that  the  skewness  and  kurtoses  are  known, 


CRB0{90)  = 


N 


-(^3,1  +  *3,2)  +  ^(K4,l  +  K4,2) 


(16) 


(ii)  If  «3,i  or  k4  ,  are  not  known,  the  FIM  for  these  param¬ 
eters  is  identically  zero,  and  the  CRB  for  8a  (computed  via 
the  pseudo-inverse)  remains  the  same  as  in  (i). 

Remarks: 

(a)  It  is  surprising  that  the  FIM  for  the  kurtosis  is  zero; 
but,  once  we  find  9,  we  can  derotate  and  estimate  the  two 
source  signals,  from  which  the  skew/kurtosis  can  be  esti¬ 
mated;  however,  since  the  two  sources  may  be  swapped,  the 
estimated  kurtoses  also  may  be  swapped.  Thus,  FIM  =  0 
is  due  entirely  to  the  permutation  ambiguity.  In  the  scalar 
case,  given  x(t)  =  as(t),  where  s(t)  is  unit  variance,  iid,  one 
can  estimate  both  a,  and  the  kurtosis  of  s  consistently. 

(b)  When  the  sources  are  symmetrically  distributed,  the 
third-order  cumulants  vanish  and  the  CRB  reduces  to  that 
given  in  [61.  The  CRB  is  symmetric  in  the  skewnesses  and 
kurtoses  of  the  two  sources.  If  both  sources  have  zero  skew¬ 
ness  and  zero  kurtoses,  then  the  CRB  is  infinity,  and  9 
cannot  be  estimated,  since  the  truncated  Gram-Charlier  ex¬ 
pansion  reduces  to  the  Gaussian  pdf.  It  is  also  interesting 
that  the  CRB  expression  does  not  involve  the  sign  of  the 
skewness  or  the  kurtosis. 

(c)  Both  CR  bounds  are  independent  of  the  true  value  of 
8.  In  order  to  compare  them,  we  consider  the  case  where  the 
source  signals  are  generalized-Gaussian,  p(s)  oc  exp(— |s|“)  . 
Figure  (1)  shows  the  2  CRBs  for  the  case  where  both  source 
signals  have  the  same  shape  parameter.  Notice  that  in  the 
heavy-tailed  case  ( a  <  2),  the  exact  and  truncated  bounds 
are  close.  In  the  lighter-tailed  case  (a  >  2),  the  CRB  cor¬ 
responding  to  the  truncated  estimator  is  quite  pessimistic. 


Figure  1.  Exact  and  approximate  CRBs  and  large  sample 
variance  of  the  estimator  in  (12)  vs.  shape  parameter  a  of  the 
generalized  Gaussian  pdf;  both  sources  had  the  same  pdf.  T  =  1. 


5.  SIMULATION  RESULTS 

An  extensive  set  of  simulations  were  carried  out  to  compare 
the  five  methods:  (1)  AML  -  method  proposed  in  Section  II 
of  this  paper;  (2)  C  -  method  in  [4];  (3)  HL  -  method  in  [6]; 
(4)  SGS  -  based  on  a  result  by  Swami  et  al  in  [9]  and, (5) 
2N  -  method  in  [11] 

A  fixed  rotation  matrix  corresponding  to  90  =  15°  was 
chosen,  and  the  number  of  samples  was  N  =  5000.  The 
input  source  signals  were  iid  sequences  drawn  from  different 
pdfs.  Simulation  results  are  summarized  from  K  =  1000 
runs.  Table  I  shows  the  results  of  the  simulation  study 
for  3  different  examples  (corresponding  to  different  source 
pdfs).  For  each  example,  the  four  rows  show  the  mean, 


Figure  2.  Standard  deviation  of  the  rotation  angle  estimates 
vs.  /C4fi  +  K4  2  with  K4;i  =  — 0.8;  N=5000;  SNR  —  00.  Both 
sources  had  the  generalized  Gaussian  pdf. 
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Table  1.  Comparison  of  different  estimators 

standard  deviation,  minimum  and  maximum  of  the  bias  of 
the  estimate.  The  source  signals  were  changed  randomly 
from  realization  to  realization. 

Another  simulation  example  is  reported  in  Figure  2. 


6.  EXTENSIONS 

In  this  section,  we  generalize  our  AML  estimator  so  that  the 
likelihoods  Z~( z;  8)  and  if  (z;  9)  can  be  combined  arbitrarily. 
Furthermore,  we  extend  Comon’s  estimator  [4],  which  is 
based  on  the  concept  of  ICA,  to  the  k4,i  +  k4,2  =  0  case. 

6.1.  Weighted  AML  (WAML)  estimator 

Consider  the  more  general  estimator 

90(w)  —  iatan2  (w(4»7ii72)  +  (1  -  «')C>74, 

w(2rfi  -  2»?1)  +  (1  -  w)C>73)  (17) 

where  w  (0  <  w  <  1)  is  a  weight  parameter.  This  estimator 
reduces  to  that  in  eq.  (12)  when  w  =  0,  and  to  that  in  eq. 
(14)  when  w  =  0.5.  This  latter  case  is  obtained  by  equally 
combining  z;  9)  and  if  (z;  9).  The  former  case  is  obtained 
by  ignoring  l^(z;9).  The  weight  parameter,  w,  could  be 
adjusted  using  a  priori  information  about  the  source  pdfs. 

6.2.  Generalized  ICA-based  BSS 

With  z(t)  and  U  as  defined  in  (2)  and  (3),  we  can  write  the 
cross-cumulants  of  the  output  in  terms  of  input  cumulants 
defined  in  (4)  and  the  unknown  angle  8.  There  are  five 
distinct  fourth-order  cumulants  involving  two  sensors 

cun  —  1^4,1  cos^($o)  -f-  K4(2  sin  (9o) 

C2222  —  «4,1  sin4(0o)  +  ^4,2  COS  (80) 

cm2  =  «4,i  cos3(0c>)  sin(0o)  —  k4,2  cos(#0)  sin3(0o) 
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=  «4,1  cos(^o)  sin3(0o)  -  k4,2  cos3(6>o)sin(0o) 

C1122  =  («4,i  +  K4,2)cos2(0o)sin2(^o) 

where  Cijki  =  cum(zi,  Zj,  Zk,zi).  Using  some  of  these  equa¬ 
tions,  we  suggest  the  following  estimator 

tan0o  =  "  +sign  (/3)^j  +  1  .  (18) 

where 

p  w(dm  -  C2222)  +  (1  -  w)(cni2  -  C1222)  , 

W(cni2  +  C1222)  +  (1  ~  W»)cil22 

Note  that  cmi  —  C2222  and  cm2  +C1222  are  proportional  to 
K4,i  —  «4,2,  and  cm2  —  C1222  and  C1122  are  proportional  to 
k4,i  +«4,2-  When  u  =  0,  the  estimator  reduces  to  Comon’s 
estimator  [3],  which  is  consistent  only  if  k4i  1  +  k4|2  /  0. 

6.3.  Simulation  results 

The  source  signals  were  generated  using  the  generalized 
Gaussian  pdf,  and  N  =  5000.  Figure  3  displays  the  stan¬ 
dard  deviations  (STDs)  of  the  estimates  in  eqs.  (17)  and 
(18)  vs.  w  when  the  source  pdfs  are  identical  (uniform  pdfs). 
It  is  seen  that  the  proposed  WAML  estimator  is  less  sen¬ 
sitive  to  w  than  the  proposed  ICA-based  estimator.  The 
WAML  estimator  with  w  =  0.5  seems  to  perform  almost  as 
good  as  the  optimal  WAML  estimator  (which  is  obtained 
for  w  =  0  in  this  case),  especially  when  the  source  kurtoses 
are  negative  valued. 

Figure  4  displays  the  STDs  of  the  angle  estimates  when 
«4,i  +  «4,2  =  —0.06  (with  K4(i  =  0.92).  As  expected,  both 
estimates  fail  when  w  =  0,  and  the  optimal  estimates  are 
obtained  when  w  =  1.  It  is  also  seen  that  the  ICA-based  es¬ 
timator  with  a  good  choice  of  w  can  outperform  the  WAML 
estimator  in  some  scenarios. 

Figure  5  displays  the  STDs  of  the  angle  estimates  when 
*4,1  =  —1,  «4,2  =  0.  In  this  case,  the  optimal  value  of  w 
lies  between  0  and  1. 

Note  however  that  by  choosing  0  <  tv  <  1,  only  the 
accuracy  of  the  estimates  is  affected,  in  contrast  with  the 
extreme  cases,  w  =  0  and  w  =  1,  where  the  estimates  may 
completely  fail.  This  suggests  that  even  if  the  source  signals 
are  known  to  have  the  same  distribution  (e.g.,  communica¬ 
tion  signals),  it  might  be  beneficial  to  choose  w  >  0  (e.g., 
w  =  0.2)  in  order  to  make  the  BSS  robust  to  impulsive 
interference. 


Figure  3.  STD  of  the  rotation  angle  estimates  when  K41  = 
«4,2  =  -1- 

6.4.  Extensions 

Future  work  includes  the  extension  of  the  proposed  BSS 
techniques  to  complex  valued  signals,  and  performance 
analysis.  The  latter  will  be  useful  to  estimate  the  opti¬ 
mal  value  of  the  weight  parameter  w  in  (17)  and  (18),  from 
the  array  outputs. 


Figure  4.  STD  of  the  rotation  angle  estimates  when  K41  + 
K4,2  =  —0.06.  K4tl  =  0.92 


Figure  5.  STD  of  the  rotation  angle  estimates  when  = 

—  1,  «4,2  =  0. 
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I.  Introduction  first  analyzed  in  [8],  With  considerable  analytical  effort 

we  herein  extend  DC/DC  results  to  DC/AC. 


One  of  the  new  and  inexpensive  [1-6]  concepts  in 
power  electronics  is  the  principle  of  random  pulse- 
width  modulation  (RPWM)  for  control  of  hard- 
switched  power  converters,  accelerated  by  the  steadily 
increasing  concern  with  or  regulations  regarding 
emissions  of  acoustic  noise,  vibrations  and  electric 
fields.  The  random  switching  frequency  (RSF-PWM) 
method  has  proven  to  be  the  most  effective  for 
reduction  of  acoustic  annoyance,  but  due  to  the 
irregular  sampling  of  samples  in  time,  the  method  has 
not  heretofore  been  accurately  analyzed,  and  selection 
of  the  random  switching  frequencies  has  been  more  or 
less  based  on  trial-and-error.  This  paper  removes  a 
great  deal  of  the  guesswork  by  providing  formulas  for 
not  only  the  continuous  spectrum  (Watts/Hz),  but  also 
the  pure  power  (Watts)  components  (harmonics)  in 
both  single-phase  and  three-phase  voltage  inverters, 
and  these  are  verified  by  laboratory  measurements. 

II.  Principles  of  Random  Switching 
Frequency  PWM;  from  DC/DC  to  DC/AC 

The  power  circuit  of  a  three-phase  inverter  consists  of 
three  legs,  which  requires  three  independently 
controlled  bilevel,  single  phase  switching  functions 
aft),  b(t),  and  eft).  The  line-to-line  voltage  may  be 
found  as  the  difference  between  the  switching 

functions  e.g.  Uab  =  a(t)  -  b{t) .  The  time  varying 

duty  cycle  of  each  pulse  is  M{t)  =  \  (1  +  mF{t ) ) , 
where  m  is  the  modulation  index  and  for  sinusoidal 
modulation  F(t )  =  sin(2/r  f{t) ,  and  for  third 
harmonic  injection 

F(t )  =  (cos(2/r  ijf)  -  j cos(6 n  fit)).  Each 

new  pulse  width  is  determined  by  a  sample  of  F(t) 
taken  at  a  time  determined  by  a  correspondingly  new 
selection  of  the  random  switching  frequency,  as  shown 
in  figure  1.  A  summary  of  many  randomization  pulse 
schemes  is  given  in  [10],  and  included  is  a  simple 
result  of  importance  for  DC/DC  conversion  which  was 


The  fundamental  process  for  DC/DC  conversion  is  a 
random  segment  width  (random  switching  frequency) 
pulse  train,  similar  to  that  in  figure  1,  having  sequential 

segments  with  random  widths  Trand  constant  duty 
cycle  0  <  M  <  1  within  the  segments.  When  the 
segments  start  at  times  tm  instead  of  being  centered  as 
in  figure  1,  the  process  can  be  written 

a(t)=  jr  ujt-tm), 


Jl,  0<t<Mzm 
[0,  otherwise 


(l.l) 


where  the  time  location  of  the  m,h  pulse  llm  (t-  tm)  is 


iii—l 

4  =  XTr>  =  .  T0=°- 


where 


segment  widths  Tr  are  randomly  selected  from  a 
known  distribution.  When  M  is  a  deterministic  periodic 
function  of  time,  the  process  can  be  used  for  DC/AC 
conversion. 


The  power  spectral  density  W(  f)  of  a  random 
process  is  defined  to  be  the  time  average,  as  the  time 
window  duration  T'  approaches  infinity  ,  of  the 
ensemble  average  of  the  magnitude  squared  of  the 
Fourier  transform  of  a  time-windowed  sample  process 
[9]: 

M'(0  =  J,!“i>=y7£'{|narWl|2}.  0-2) 

where  F[ar  (/)]  is  the  Fourier  transform  of  the  T'  - 
second  length  of  a  sample  process  from  the  ensemble. 
Assuming  independence  of  pulse  widths  in  the  m,h  and 
(m+k)th  segment,  Middleton  [8]  used  this  approach  to 
write  the  power  spectral  density  of  a(t)  produced  the 
sum  of  the  three  spectral  portions  determined  by  the 
values  of  the  correlation  delay  index  k  relative  to  the 
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sequence  index  m.  With  DC/AC  conversion  the  duty 
cycle  of  the  pulses  depends  upon  the  sampling  point 
within  the  deterministic  modulating  waveform,  and  we 
now  have 


[0,  otherwise 


(1.3) 


showing  that  widths  of  any  distinct  pulses  in  the 
sequence  are  actually  dependent,  and  Middleton’s 
formula  does  not  apply  directly.  We  now  consider  that 
the  duty  cycles  M(t)  can  be  written  as  a  periodic 
function  of  the  uniform  random  phase  variable  0.  The 
width  of  the  mth  segment’s  pulse  is  now  a 

deterministic  function.  Then  without  loss  of  generality, 
because  we  have  stationary,  ergodic  processes,  we  may 

as  shown  in  figure  1  select  tm  =  0  as  the  sampling 

time  tm  of  the  m,h  segment  and  let  the  pulses  be 
centered  on,  rather  than  start  at,  the  sampling  times. 

◄ - ► 
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Figure  1.  Example  sequence  of  segments,  k  >  1 . 
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III.  Approximation  to  the  Exact  formula 
for  the  Power  Density  Spectrum 


The  approximation  by  Bech  [7], 
Sk  =  Yk  +  ~ — ~"-m+k  =  At  in  M(»)  in  the  second 

Ld 

factor  of  each  term  in  the  second  expectation  in  (1.5), 
makes  the  corresponding  factor  independent  from  the 
other  two  and  calculation  possible.  We  implement  the 
approximation  and  use  the  fact  that 
M{(kT  +  LT), 6)  =  M{kx,e)  and  yK  =  yLN+k , 

where  L  —  fix[K /  N]  and  N  —  T It  ,  and  we  also 
replace  the  resulting  infinite  geometric  series  sum  over 

l  with  Ele12*^ - - — j-.  Now 

completely  rewriting  (1.5),  we  have  the  Bech 
approximation: 


Referring  to  figure  1,  we  can  see  that  the  time 
yk  =  tm+k  -  tm  between  the  &'  and  A"'  pulse,  if  k  >  1 , 
is  the  sum  of  (A  — 1)  random  segment  widths.  The 
time  Sk  between  pulse  centers  adds  half  the  width  of 

the  mlh  segment  plus  half  the  width  of  the  m+ 
segment.  Thus 

rm-k- 1 

yk=^rPk>2]  yk  =0,A  =  1;  yk  =-rw,k  =  0 

b=m fl 

(L4) 

Finding  the  cross  spectrum  Sm  m+k  between  two  pulses 

at  times  m  and  m+k  over  all  k  and  taking  expectation 
leads  to  the  exact  expression  for  the  power  density 
spectrum: 


Single  phase: 

w( 0  =_/,.2  Ee,m  {sin2  (n  frmM(Q,e))}+ 


Ze 


T  (k  f) 


T  (K  f) 


•  Re 


P 

n 


sin  (n  frmM( 0, 9) )  e ^ 


sin  (n  ft  w,M(lr  ,0))e 


hW)‘ 
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where  T  is  the  expected  segment  width  and 
N=T/T  (rounded). 


IV.  Analysis  of  the  Discrete  Power 
Spectrum 


Starting  with  a  repeat  of  (1.5),  retaining  generality  by 
setting  sequence  index  m  =  0  and  letting  tm  =  0 , 

treating  M{5k,0)  =  m (tm  +  yk  +  Tm  +^m+k-,e) 

as  a  deterministic  function  in  the  variable 


§  =  Y  +  Tn»  +rm*k  having  period  T  =  1  /  fx, 
replacing  an(^/T„rfM5*,0))with  its  exponential  Fourier 

series  having  n'h  coefficient  Fn(fdCk)eic0  and 
collecting  the  exponential  delay  factors  gives 
J?{sin2  {n  /t0M(0,  9)) } 

e  [  fsin(7r/ToM(O,0))  1] 


m  -—3 

{nf) 


fel  ^  ^ 


■flnnffr  e-jrt 


where  £  =  1,  /“  =  0;  £  =  2,  /’>0.  Now  we  can 


see  that  the  exponential  factors  e  ^innflSk  in  the  Fourier 
series  and  that  of  the  delay  factor  e'2' fS>  may 
combine  for  some  discrete  frequencies  f  =  to  give 
unity  for  all  k ,  causing  the  sum  over  k  to  give  infinite 
power  spectral  density  but  finite  power.  Discrete 


frequencies  /=  fd  having  infinite  power  density  have 
been  shown  to  exist  not  only  at  1)  harmonics  of  fLCM  , 

the  LCM  of  the  set  of  switching  frequencies  and  the 
factor  2,  but  also  at  2)  other  discrete  frequencies  with 

non-zero  power  at  fx  -spaced  sidebands  around 

harmonics  of  fLCM .  At  these  frequencies  we  use  the 

expression  for  the  power  rather  than  power  density. 
The  resulting  power  expressions: 
single  phase 

line-to-line 

P*  =7-^k  (f„(«  |  (i-coK&vt/S), 

\K  fjt) 

£  =  \,  f  =  0j£  =  2,  f>0r, 

fd=KfmA±nfi<  K, n =0,1,2, ~ 

_ _ (1.10) 

where  e ine  Fn  ( fdt)  is  the  n,h  Fourier  coefficient  in 
the  exponential  series  expansion  of 
sin(;r  fxkM{8k,9))  considering  8k  the  time 

variable  with  period  T.  We  have  also  derived  an 
analytic  solution  for  the  special  case  of  modulation 

with  a  single  sinusoid,  where  the  Fn  {fd f)  are  easily 
found  in  terms  of  Bessel  functions. 

V.  Examples  of  Mixed  Power  and  Power 
Density 

Now  that  we  have  the  expressions  for  approximate 
density,  which  is  adequate  at  frequencies  other  than 
fd  =KfLCM  ±  nfx ,  and  for  power  that  is  accurate  at 

discrete  frequencies  fd  =Kft CM  ± nfx,  we  may  plot 

the  two  separately  or  in  conjunction,  mixing  power 
density  and  power  scales  on  the  vertical  axis.  An 
example  single  phase  approximate  density  computation 
of  (1.6)  is  given  in  figure  2(a)  where  the  fundamental 
frequency  fj  =  40  Hz,  m  =  0.8,  and  the  switching 
frequencies  are  2.0,  2.5,  3.0,  3.5,  and  4.0  kHz.  Results 
are  plotted  with  minimum  frequency  500  Hz  and 
spacing  at  250  Hz.  Because  the  fundamental,  low 
harmonics  and  the  fLCM  frequencies  are  out  of  range  of 
the  plot,  there  are  no  discrete  harmonics  to  calculate 
here.  This  approximation  is  validated  by  the  laboratory 
measurements  shown  in  figure  2(b). 
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(a) 


Figure  2  (a)  Spectrum  calculated  from  (1.6).  Switching 
frequencies  [2.0,  2.5,  3.0  3.5,  4.0]  kHz,  having  equally 
likely  occurrence  in  time,  and  duty  cycle  modulation 
M(t)  =  y(l  +  0.8cos(<m,!))  •  (b)  Measurement  of  the 
same  spectrum. 

Figure  3  compares  computations  to  measurements  for 
line-to-line  voltage  near  fLCM  =  12KHz  for 

modulation  by  first  and  third  harmonics  of  40  Hz, 
switching  frequencies  at  2,3,4  kHz,  m  =  0.8.  Figure 
3(a)  is  the  combination  of  1)  a  dashed  curve  for  the 
approximate  density  formula  even  wrongly  calculating 
density  at  the  discrete  power  frequencies,  and  2)  a 
solid  curve  giving  the  power  formula  at  the 
discretefrequencies.  The  vertical  scale  is  watts  per 
hertz  for  the  density,  but  watts  for  the  discrete  power 
frequencies.  Figure  3(b)  is  generated  from 
measurements  with  a  density  setting;  it  matches  the 
approximate  expression  for  the  density  except  at  those 
frequencies  that  relate  to  discrete  harmonics.  Figure 
3(c)  is  generated  from  measurements  with  a  power 
setting  and  matches  exact  power  formula  very  well  at 
the  discrete  frequencies..  Both  the  density  and  power 
measurements  fully  verify  the  analysis.  It  should  be 
emphasized  that  each  part  of  these  inherently  mixed 
spectra  require  measurements  with  the  proper  PSD- 
scaled  and  a  PWR-scaled  settings. 


formula  for  the  continuous  density.  Furthermore,  the 
theory  is  generalized  to  any  periodic  modulation 
function.  As  an  example,  the  third  harmonic  injection 
technique  was  analyzed,  and  based  on  extensive 
comparisons  of  predicted  spectra  with  measured 
spectra  and  as  opposed  to  all  earlier  investigations  of 
random  PWM  schemes,  the  theory  is  found  to  be  very 
accurate  for  both  single  phase  and  line-to-line  spectra. 
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Figure  3.  Comparisons  near  fLCM  for  modulation  by 

First  and  third  harmonics  of  40  Hz,  line-to-line  voltage, 
switching  frequencies  at  2,3,4  kHz,  m  =  0.8.  (a)  Power 
(circled,  watts)  and  power  spectral  density  (solid, 
watts/Hz);  (b)  measurement  using  density  scaling;  (c) 
measurement  using  power  scaling. 
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Abstract 

In  this  paper  we  give  the  formulation  of  the  power  spectral 
density  of  the  randomized  pulse  width  modulation 
(RPWM)  DC/DC  converter  based  on  both  constant  duty 
cycle  and  constant  pulse  width  schemes.  We  analyze  the 
spectral  formulas  based  on  the  constant  duty  cycle  and 
develop  a  means  for  nulling  the  power  spectral  density  at 
one  specified  frequency  and  its  harmonics.  We  revert  to 
optimization  methods  when  switching  frequency  range  is 
subject  to  practical  constraints.  Simulation  results  illustrate 
the  effectiveness  of  our  approach. 

1.  Introduction 

The  switching  which  is  needed  when  converting  a  DC 
power  source  to  a  lower  voltage  typically  causes  both 
harmonics  and  electromagnetic  emissions.  It  is  possible  to 
reduce  or  eliminate  both  of  these  by  randomizing  the 
switching,  but  it  is  may  also  be  important  to  null  the 
generated  spectrum  at  frequencies  for  which  the  load  has 
natural  resonance.  Proper  design  of  the  switching 
frequencies  and  their  probability  density  function  (PDF) 
gives  control  over  the  output  noise  power  spectral  density 
(PSD)  at  specified  frequencies  whose  values  we  want  to 
minimize.  So  the  problem  addressed  by  this  research  is  the 
analysis  and  design  of  the  power  spectral  density  of 
randomized  pulse  width  modulation  (PWM)  DC/DC 
converters  by  specification  of  an  appropriate  set  of 
switching  frequencies  and  their  probabilities.  Many 
randomization  schemes  for  pulse  width  modulation 
(RPWM)  in  power  converters  have  been  proposed  and 
several  analyzed  in  detail  in  the  literature,  A  summary  of 
many  of  these,  especially  for  DC/DC,  is  given  by 
Stankovic  [1],  Many  schemes  for  DC/ AC  have  also  been 
published,  but  we  are  not  concerned  with  those  in  this 
study.  A  simple  result  of  importance  for  DC/DC 
conversion  was  given  and  analyzed  first  by  Middleton  [2] 
and  also  discussed  by  Stankovic  [1],  Stankovic,  Verghese, 
and  Perrault  [3]  have  addressed  the  DC/DC  convertor 
problem  of  random  switching  design  for  spectral  control  of 
harmonic  powers  and  cumulative  power  in  specified 
bands,  also  our  eventual  intention.  They  used  2  switching 


frequencies  selected  at  random,  but  with  Markov  statistics 
such  that  long  sequences  of  one  switching  frequencty  were 
discouraged.  They  designed  for  optimal  results.  They 
found  the  method  ineffective  for  wideband  control  but 
successful  for  narrowband.  Our  approach  is  quite 
different,  designing  only  a  zero  order  switching  frequency 
selection,  and  entirely  working  from  the  formulation  in  the 
frequency  rather  than  the  time  (autocorrelation)  domain. 
We  consider  specification  of  a  minimum  switching 
frequency  to  control  worst  case  ripple  in  the  low 
probability  case  of  continual  selection  of  the  same 
switching  frequency,  and  we  consider  specification  of  a 
maximum  to  control  switching  inefficiency  and  other 
circuit  losses.  In  addition,  we  give  new  and  powerful 
design  results  regarding  the  powers  of  the  harmonics 
related  to  the  distribution  of  the  switching  frequencies. 

We  first  give  the  formulas  of  the  two  schemes,  constant 
duty  cycle  (CDC)  and  constant  pulse  width  (CPW)  derived 
straightforwardly  from  the  early  work  in  [1],  Analysis  of 
the  formulas  leads  to  a  clear  means  of  nulling  the  power 
spectral  density  at  one  specified  frequency  and  its 
harmonics.  At  the  same  time  the  least  common  multiple 
frequency  fLCM  of  the  switching  frequencies  can  be 
designed  as  needed  and  the  powers  of  fLCM  and  its 
harmonics  are  easily  determined. 

Although  formulation  shows  that  the  fixed  pulse  width 
scheme  allows  a  spectral  zero  to  be  placed  as  desired 
within  the  constraints  of  the  minimum,  maximum  and 
average  switching  frequency  allowed,  that  scheme  does 
not  provide  control  of  the  duty  cycle  within  each  segment 
and  is  deemed  "uncontrollable"  because  with  some 
probability  a  long  string  of  pulses  with  low  or  high  duty 
cycles  may  momentarily  cause  unacceptable  voltage 
deviations  from  the  desired  output  DC  level.  Thus  we 
focus  our  analysis  and  simulations  on  the  formulas  for  the 
spectral  density  using  the  constant  duty  cycle  scheme. 

2.  Formulations 

We  first  present  the  summary  of  the  equations  for  the 
power  at  discrete  frequencies  and  the  power  spectral 
density  (PSD)  for  both  schemes  CDC  and  CPW.  General 
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derivations  for  the  PSD  are  straightforward  applications  of 
formulas  found  in  [1],  The  derivations  of  the  DC  power 
and  power  at  harmonics  of  the  least  common  multiple 

ficM  °f  the  set  of  random  switching  frequencies  have 

been  derived  in  our  reports  to  be  published  in  more 
expanded  form  than  allowed  here.  The  / lcm  and  its 
harmonics  are  those  having  finite  power;  this  can  be  seen 
by  noting  which  frequencies  cause  the  denominator  terms 

1  -  E(eJ2Kft )  in  the  PSDs  of  (2)  and  (4)  below  to  be 
zero,  assuming  a  finite  set  of  switching  frequencies  are 
randomized. 


2.1.  Formulas  for  CPW  Scheme 


For  CPW  every  pulse  has  constant  amplitude  A0  and 
duration  T0 ,  but  the  average  switching  interval  is  such 

that  the  average  duty  cycle  CC  =  T0  /  T  is  as  desired.  E 

denotes  statistical  average,  and  T  is  the  random  segment 
duration,  the  reciprocal  of  the  random  switching 

frequency,  and  E{ T}  =  T  .  The  frequencies/  denote 

the  f lcm  or  any  one  of  its  harmonics. 

Power  at  discrete  frequencies  (watts) 


P(f*)  = 


Pdc 


'Vo  V 


,/*  =  0 


2  K 


(1) 


{tnf  *)' 


-sin2(;r/*T0),  /*>  0 


One-sided  spectral  density  (watts/Hz) 


=*_2  f 2 

m  f 


l+2Re 


E(e,lKlx) 

1  -E(ej2KfT) 


e=X  f  =&,e  =  2,  f  >0 


(2) 


2.2.  Formulas  for  the  CDC  Scheme 


P(f)  = 


(4a)\/=0 

s,(f ) vwi2 

(xfr) 


L_ E{\-e^f'x)  ,/> 0 


242 


(2^/t) 


One-sided  snectral  density  (watts/Hz) 

^£{sin2(^a/r)} 


(3) 


W  (/)  =  £^°  ■ 
ysU  Tit1/1 


+2  Re 


( E^tniita /T)einfx  ]) 


1  -E{eilKfx} 
e  =  l, /=0,e  =  2, />0 


(4) 

3.  Spectral  Analysis  and  Design  Techniques 
for  CDC  Scheme 

Now  we  must  choose  the  proper  switching  frequencies  and 
their  probabilities  for  control  of  the  discrete  frequencies 
having  non-zero  power  and  for  minimization  of  the  power 
spectrum  at  a  desired  frequency  f0.  Both  design  goals  are 
oriented  toward  keeping  generated  power  away  from 
frequencies  known  to  be  harmful  either  to  the  environment 
or  converter  loads.  It  is  assumed  that  the  designs  can  be 
updated  adaptively. 


3.1.  Nulling  E [sin2  (na /r)jand 

E\sm(nafT)ejKfx } 


In  this  scheme  the  pulses  have  varying  width,  according  to 
the  width  of  the  segment  but  the  duty  cycle  is  fixed.  The 
same  notation  is  used  as  in  CPW. 

Power  at  discrete  frequencies  (watts) 


Because  a  null  is  repeated  periodically  on  the  frequency 
axis,  we  direct  our  attention  to  the  lowest  possible 
frequency  that  can  be  nulled  given  the  constraints  on  the 
range  of  switching  frequencies.  Bother  first  and  second 
terms  in  (4)  are  nulled  if  for  the  given  a  the  random 
variables  Tj  are  selected  such  that  the  values  of 
sin2(7ra/T)  and  sm{7iafT)e'nfT  are  always  zero 
at  /  =  f0  ■  Thus  writing  f0=  Pfs,  where  fs  is  the 
average  switching  frequency  and  (3  is  a  positive  scale 
factor,  we  can  see  that  for  a  null  at  f0  (and  all  its 
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harmonics,  k  =  1,2,...)  the  relationship  that  the  switching 
frequencies  /  =1/T,  must  have  to  fs  is: 


i  =  1,  ;k  =  1,2,... 


assuming  7Vs  is  the  maximum  number  of  switching 


frequencies  used.  Because  the  fs  are  also  constrained  to 
lie  in  [/Jmin,  /smax  ]  anc* thc  largest  must  be  greater  than 


the  average,  the  largest  /  =  /  in  (5)  is  constrained  by 

fs  */,  =  <*PfJK  <fsmm  (5.1) 

where  kx  gives  the  relationship  (5). 

In  order  for  the  average  to  be  as  desired,  /  must  be  given 

a  probability  that  makes  its  weight  greater  than  or  equal  to 
that  of  all  the  other  frequencies.  Similarly,  the  smallest 
/S;  =  fSf/  is  constrained  by 


fsmin<LN=apfs/kNs<fx  (5.2) 


3.1.1.  Limitations  on  /«,  the  Null  Frequency 

for E {sin2 (na /t)  j  and  E{sin(nafr)ej7[fT} 

If  the  constraints  (5)  -(5.2)  cannot  be  met,  then  no  design 
can  give  a  pure  null  in  both  terms  of  (4).  In  fact,  (5.1)  and 
(5.2)  jointly  give  limits  on  the  lowest  frequency  of  null 

fo=Pfs- 

fstmn  ~  4  ~afo^Ns  —  fs  —  fs,  =(Xfo^l  -  fsmax. 

(6) 

Placing  nulls  at  higher  frequencies  is  not  as  difficult, 
because  the  frequencies  of  nulls  are  periodic,  and  we  may 
null  any  frequency  kf0  as  long  as  f0  satisfies  the  above 
expressions.  The  lowest  switching  frequency  can  easily 

satisfy  (6)  if  /jmin  is  not  too  close  to  /smax. 

3.1.2.  Simulation  Results  and  an  exception 

We  first  choose  a  set  of  switching  frequencies  according  to 
(5),  nulling  E  jsin2(^a/r)}  with  a  duty  cycle  of  80%, 

switching  frequencies  25KHz  and  50KHz,  /0=62.5KHz  . 

The  resulting  PSD  value  at  fQ  is  -140.1  dB.  The  spectral 
result  is  shown  in  figure  1  . 


Dot1e<J:PSO  of  twitching  ichsme  from  fermulitton  and  8slld:  from  tlmulatlon 


Figure  1.  Results  for  duty  cycle  80%,  switching 
frequencies  25KHz  and  50KHz.  Dotted:  theoretical. Solid: 
simulation.  There  is  ideally  a  perfect  null  at  62.5KHz. 

If  we  choose  duty  cycle  equal  to  50%  and  switching 
frequencies  15.6KHz  and  31.3KHz,  and  f0=  62.5KHz 
then  we  get  the  result  shown  in  figure  2  .  Note  that  the 
theoretical  PSD  value  at  ff)  =  62.5KHz  from  the  formula 

is  -172.6dB,  but  the  simulation’s  value  at  this  frequency  is 
about  -45dB.  This  difference  is  due  to  a  non-determinism 

that  will  always  occur  when  duty  cycle  a  =  — ,  — - —  . 

2  3  k 

Thus  for  this  coincidental  case  we  cannot  use  the  nulling 
of  E{s\n2 (n f  ax)}  to  get  a  sharp  null  at  the  desired 
frequency  /„  if  /„  coincides  with  a  harmonic  of  the  least 
common  multiple  fLCM  . 


Figure  2  .  Nulling  E {sin2  ( 7ta /T)}for  50%  duty  cycle, 

switching  frequencies  15.6KHz  and  31.3KHz,  /0  = 
62.5KHz.  Dotted:  analytical.  Solid:simulation. 

3.2.  Maximizing  \  -E{ej2nfT  j 
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The  magnitude  of  the  second  term  in  (4)  at  frequencies 
other  than  those  nulled  by  (6)  is  dominated  by  the 

denominator  expression,  1  —  Zsjev2,r^rj,  which  can 

approach  zero  and  cause  the  density  not  only  to  become 
large  but  perhaps  to  become  infinite.  To  preclude  that,  we 

can  force  1  —  E {ei2nfT  j  to  have  maximum  magnitude  at 
a  desired  frequency  /0  =  /3  fs  as  above,  except  that  we 
are  denoting  /0  as  the  frequency  at  which  we  would  like 


to  maximize  1  -  E^ei2nfr  j .  Every  exponential  term  can 
precisely  equal  -1  if 

2tt/o  '//j  =(2fc  +  l)7T,  k  =  0,1,2,. ..,NS .  Thus  we 
have  that: 

2  Pf.  _  2/0  * 


A  =• 


•,  k  =  l,2,...,Ns 


(7) 


2k -l  2k -1 

As  before  the  minimum  and  maximum  frequencies  are 
constrained  according  to 

fs<fsi  =  2(5fs<fsmax  ~  (7.1) 

A-.*  A*  ~2fifsl {2NS  -  \)<fs  (7.2) 


3.2.1.  Limitations  on  /0 the  Maximizing 
Frequency  for  1  -  E  {ej2nfT  j 


The  constraints  imposed  by  minimum,  maximum  and 
average  switching  frequencies  on  the  frequency  of 

maximum  of  the  denominator  factor  1  —  E^eilnfz  jare 
similar  as  those  imposed  by  the  term  E {sin2  (KCl /t)  j . 


A 


f 

2 


/o' 


f  f  f 

J  S  S  J  S\  _  £  \  S’  J  s\ 


<  <  UL  =  f  ’  < 

(2NS  -1)  2  2 


(8) 


When  we  want  to  minimize  the  PSD  within  the 
constraints  of  the  average  and  finite  range  of  switching 

frequencies,  the  most  important  factor  is  1  —  E{ej~nfT } . 
This  is  because  when  we  want  to  null  sin  (7TC£ / T)  at  an 
arbitrary  frequency  there  are  practical  limitations.  We  have 
also  found  that  even  without  the  constraints  on  switching 
frequencies  sometimes  sin(7T0(/T)  cannot  be  simply 
nulled  if  the  frequency  of  desired  null  coincides  with  a 
harmonic  of  the  least  common  multiple  fLCM  (figure  2) . 

3.2.3  Simulation  Results  and  Discussion 

When  we  maximize  1  —  E ^ei2Kfx }  to  minimize  the  peak 

at  f0=  62.5KHz  according  to  (7),  the  switching 

frequencies  for  maximization  can  be  25KHz  and  41.7KHz. 
The  power  spectral  density  is  shown  in  figure  3. 

4.  Optimization  Method  and  Results 

We  now  apply  nonlinear  optimization  method  to  find  a 
constrained  set  of  switching  frequencies  and  their 
corresponding  probabilities  that  will  minimize  the  spectral 
power  at  a  specified  frequency.  We  formulate  the 
minimization  problem,  apply  optimization  method  and 
then  give  experiment  results. 


3.2.2.  Contradiction:  Simultaneously 

nulling  E {sin 2  {na /r)}and  E^,m{na fz)ejnfT  j 

while  maximizing  1  -  E  ^el2nfz  j 


Figure  3.  Maximizing  1  —  E  [ej2nfx  j  to  minimize  the 

peak  at/0=  62.5KHz  according  to  (7);switching 

frequencies  are  25KHz  and  41.7KHz.Dotted:  analytical. 
Solid:simulation. 


If  we  combine  the  inequalities  of  the  limitations  it  is  clear 
that  both  objectives  cannot  be  met  simultaneously  because 


/  / 

or  0  <  OC  <  1  we  cannot  have  — =  f0  and  — -  =  fQ  1 

(X  2 


equal  to  each  other.  Clearly  an  overall  or  global  optimal 
must  be  found  by  functional  minimization  to  constraints. 


We  can  formulate  our  minimization  problem  from  (4)  as 
follows: 
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Subject  to  the  constraints: 
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The  corresponding  original  and  optimized  power  spectral 
densities  for  5  random  frequencies  are  shown  together  for 
comparison  in  figure  4.  Note  that  the  value  at  62.5KHz 
has  dropped  about  3dB. 


5.  Conclusions 


We  have  analyzed  the  power  spectral  formulas  and  give 
the  ways  to  control  the  power  at  the  desired  frequency. 


The  simulation  results  prove  the  effectiveness  of  the 
concept  and  method.  When  we  have  some  constraints  on 
the  switching  frequencies,  we  can  still  able  to  minimize 
power  e  at  some  chosen  frequency,  although  we  may  not 
be  able  to  null  it.  Our  future  work  will  focus  on  control  of 
the  power  spectral  over  a  band  of  frequencies. 


Figure  4.  Power  spectral  density  detail  around  f 
=62.5KHz.  Dotted  line  represents  a  random  set  of  5 
switching  frequencies.  Solid  line  represents  the  optimized 
solution  for  those  frequencies 
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ABSTRACT 

The  problem  of  spectral  subtraction,  to  estimate  the 
parameters  of  a  single  source  in  colored  noise,  is  used  to  show 
the  relationships  between  the  likelihood  formulation  and  spectral 
density  estimation.  Reported  previously  as  a  filter  bank 
processing  for  spectral  estimation,  it  is  shown  that  the 
normalized  Capon  estimate  is  the  natural  tool  for  source  location 
in  1-d  and  2-d  scenarios  when  the  noise  background  estimate  is 
faced  as  a  spectral  subtraction  problem.  Several  simulations 
selecting  l-d  and  2-d  apertures  are  used  to  show  the  degree  of 
quality  achieved  with  the  proposed  formulation.  Also,  the 
Periodogram  test  for  incoherent  detection  is  analyzed  in  front  of 
the  optimum  test  and  the  herein  referred  to  as  the  Capon’s  test. 


I.  INTRODUCTION 

Using  the  maximum  likelihood  formulation  for  the 
problem  of  a  line  source  embedded  in  colored  noise,  this  work 
seeks  the  relationships  between  frequency  detectors,  spectral 
density  estimates  and  spectral  subtraction. 

It  seems  clear  that  characterizing  the  source  location, 
its  power  level  and  the  spectral  density  of  the  noise,  entails  to 
estimate  first  the  source  location,  second  its  power  level  and 
finally,  by  spectral  subtraction,  the  noise  spectral  density. 
Assuming  this  path  in  the  procedure,  it  seems  also  clear  that  a 
high  resolution  line  detector  is  needed  at  the  first  step  and  this  is 
the  reason  for  the  interest  towards  the  spectral  density  estimates. 
For  the  second  step,  a  power  level  estimate  is  required  and 
apparently  the  Capon  estimate  has  no  competitor  to  perform 
such  estimation.  Finally,  the  third  step  reduces,  again  apparently, 
to  the  subtracting  the  estimated  line  contribution  from  the  data 
covariance  matrix.  Regardless  this  protocol  is  valid  in  essence, 
there  are  several  issues  of  interest,  related  with  the  maximum 
likelihood  formulation,  that  preclude  and  arbitrary  choice  of  the 
procedures  used  at  every  step. 

To  go  through  the  mentioned  steps,  the  problem  of 
detecting  a  line  source  in  colored  noise  has  been  selected.  The 
presentation  focuses  on  the  case  of  an  ULA  array,  leaving  the 
case  of  2-D  apertures  to  the  simulations  section. 


This  work  has  been  partially  supported  by  the  European 
Comission  under  Project  IST-1999-11729  METRA;  the  Spanish 
Government  (CYCIT)  TIC99-0849,  and  the  Catalan  Government 
(CIRIT)  1998SGR  00081,  1999FI 00588. 


The  snapshot  model  we  are  assuming  is  formed  by  a 
single  line  impinging  on  a  ULA  array  of  Q  sensors  with  colored 
noise  co(n).  It  is  considered  that  the  actual  complex  envelop  and 
the  spatial  frequency  are  ae(n)  and  f0  respectively,  i.e.  f0  is  equal 
to  the  product  of  the  inter-element  distance  in  wavelength  by  the 
sinus  of  the  source  elevation  d.sin(90).  The  snapshot  model  is 
given  by  (1),  where  Se  is  the  (Qxl)  steering  vector  at  frequency 
fc 


Xn=ae(n).Se+wn  (1) 

The  log  likelihood  of  this  data  snapshot  is  given  by  (2), 
where  a„,  §  and  R0  are  the  estimates  of  the  complex  envelope, 
the  steering  vector  and  the  noise  covariance  matrix  respectively. 

A„  =  Ln(det^“’ J)  -(xn -an.sf  .(xn~  an.s)  (2) 

From  this  expression  is  obvious  that  the  power  level  estimate  is 
the  suitable  step  to  proceed  first,  instead  the  source  location 
since  the  log-likelihood  is  highly  non-linear  on  this  parameter. 
The  ML  estimate  of  the  source  complex  envelope  is  derived 
from  the  maximization  of  the  above  expression.  The  estimate  is 
given  by  (3.a)  and  in  (3.b)  the  corresponding  power  level 
estimate,  where  N  is  the  number  of  collected  snapshots. 


(hi  = 


Sh.R~X.X„ 

~  =Q  ~n 

sh.r~Ks 


1  AM.  2 

« £  M 

N  n= 0 


sh.r-\r.r~Ks 

{sh.r-\s} 

V-  =0 


(3.a) 


(3.b) 


Note  that  regarding  to  (3.b)  that  it  looks  different  from 
the  traditional  Capon  estimate  [1],  Usually  it  is  argued  that  the 
product  of  the  inverse  of  the  noise  covariance  matrix  by  the 
steering  is  proportional  to  the  vector  resulting  of  using  the  data 
matrix  instead  of  the  noise  matrix.  Nevertheless,  this  property 
depends  on  the  estimates  of  the  noise  covariance  matrix  and  the 
steering  of  the  source,  which  at  this  step  are  not  available. 

D  THE  LOG-LIKELIHOOD  OF  THE  STEERING  AND 
THE  NOISE  COVARIANCE 


Using  (3.a)  in  (2)  and  summing  up  for  the  available 
snapshots,  the  log-likelihood  to  be  maximized  is  obtained. 
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A  =  L«(det(R  '))  -  trace 
=o 


R~] 

=o 


R  — 


S.SH.R~l.R 


P  o 


(4) 


Tperiod  — 


SH.R.S 
SH.R  S 


(10) 


where 

p  0=Sh.R~J.S  (5) 

Re-arranging  terms  in  (4),  it  can  also  be  written  as  (6). 

A  =  ^(det^r1 1-  'race  [ft- './?]+  =fi-=4 (6) 

6=0  =  sh.r-'.s 


Just  to  put  in  evidence  the  above  comments,  in  Figure 
1,  they  are  represented  simultaneously  all  the  tests  described 
under  the  condition  of  perfect  knowledge  of  the  noise  covariance 
matrix.  The  data  are  formed  by  a  source  located  at  the  spatial 
frequency  0.1,  the  colored  noise  obtained  from  a  Moving 
Average  MA(3)  model,  with  model  coefficients  (1  0-1),  and 
SNR  equal  to  8  dB.  This  figure  reveals  the  claimed  superiority  of 
the  Capon  test  over  the  test  derived  from  the  log-likelihood. 
More  important  is  the  poor  performance  of  the  Periodogram  test. 

HI.  SPECTRAL  SUSTRACTION 


This  last  formulation  reveals  that  the  optimum  detection  test 
Topt(f)  to  estimate  the  source  location,  assuming  that  the  noise 
covariance  is  known,  is  a  Raleigh  quotient,  which  is  close  to  the 
power  estimate  (3.b)  derived  previously.  This  test  plays  always 
an  important  role  in  improving  detectors  and  beamformers  [4].  It 
is  also  coherent  with  the  white  noise  case  since  it  coincides 
precisely  with  the  data  Periodogram. 

Before  going  on  with  the  procedure,  it  is  worth 
studying  the  properties  of  the  optimum  test  under  perfect 
knowledge  of  the  inverse  noise  covariance. 


An  alternative  to  the  direct  maximization  of  the  log-likelihood 
over  the  noise  and  steering  parameters  is  to  set  the  relationship 
between  the  noise  covariance  matrix  and  the  steering  to  be 
estimated  as  a  spectral  subtraction  problem.  Note  that  noise 
covariance  estimation  is  a  major  issue  for  power  level  estimation 
[2],  and  spectral  subtraction,  regardless  of  being  heuristically  in 
many  cases,  uses  to  be  the  suitable  tool  to  perform  it. 

Under  a  spectral  subtraction  approach  the  noise 
covariance  matrix  is  formulated  as  the  subtraction  of  the  data 
covariance  matrix  minus  the  source  contribution  . 


Assuming  that  the  inverse  of  the  noise  covariance  is 
known,  the  optimum  test  to  locate  the  source  steering  is  to 
maximize  (7). 


sH  .r~\r.r~1.s 
TopW=~-=°  -,=P  ~ 
sh.r~'s 


(7) 


R  =R-$.S.SH  (11) 

The  maximum  value  of  p,  in  order  to  ensure  that  the 
estimated  matrix  is  positive  definite,  is  precisely  the  traditional 
Capon  estimate.  The  problem  is  that  using  this  estimate 
precludes  the  use  of  the  inverse  since  it  does  not  exist  because 
the  minimum  eigenvalue  of  the  estimated  noise  matrix  is  zero. 


This  test  is  lower  bounded  by  using  the  following  definitions  an 
inequality: 
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SH.R~lS 
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v=RV2.R~l.S  ;  u=jCU2.S 
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In  consequence. 


sH .r~\r.r~1.s  sh.r~\s 

Toptif)  = - ^  =  TC  (f) 

sh.r~1.s  sh.r~Ks 


(9) 


This  reveals  that,  in  terms  of  resolution  the  right  term  of  (9)  will 
be  better  than  the  optimum  test,  since  both  get  the  same  value  at 
the  true  steering.  Also  it  is  very  important  to  remark  that  the  so- 
called  classic  test  formed  by  the  quotient  of  the  periodograms 
measures  do  not  bound,  in  any  way,  the  optimum  test.  Its  use  is 
only  suitable  in  the  case  of  white  noise  only;  in  this  case  the 
Periodogram  test  coincides  with  the  optimum  test,  which  still  is 
bounded  by  the  Capon  test. 


Figure  1.  Optimum  (-),  Capon  (-.-)  and  Periodogram  test  (-). 

The  second  attempt  is  more  suitable  and  faces  directly 
the  estimate  of  the  inverse,  since  it  is  the  only  one  that  is 
required  in  the  formulation  of  the  log-likelihood  when  using  the 
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appropriate  test  for  source  location.  This  estimate  is  given  in 
(13),  where  the  second  term  is  the  rank  one  contribution  of  the 
vector  (norm  one)  that  nulls  the  contribution  of  the  last  two 
terms  of  (6).  Again,  the  parameter  y  has  to  be  bounded  in  order 
to  preserve  the  positive  character  of  the  estimate. 


/T1  =  j?-1  +y.J 


1 R  l.S.SH  .R  1  ^ 


Sh.R~2.S 


(13) 


This  bound  is  given  in  (14),  where  it  is  clear  that  it  is  less 
restrictive  than  the  case  of  modeling  directly  the  noise  matrix. 


Y 


J _ 

.R~l.S 


(14) 


At  the  same  time,  the  log-likelihood  for  this  noise  estimate  is 
equal  to  (15),  in  consequence  y  has  to  be  selected  in  order  to 
maximize  the  determinant  of  the  inverse  noise  covariance 
matrix. 


A  =  Ln(det(/T1))-(fi-l)  ;Vy  (15) 


Finally,  with  this  choice  for  the  parameter,  the  log-likelihood, 
the  optimum  test,  described  in  the  previous  section,  and  the  so- 
called  normalized  Capon  estimate  [3]  are  easily  related  as 
follows: 


R~l  =  R~l  +K0(S)\r~1.S.SH.R~1  j  (17) 

and  assuming  that  the  source  is  at  this  steering  with  power  level 
as,  the  correct  value  of  K,,  is  (18),  where  p  is  the  inverse  of  the 
power  level  from  the  Capon  estimate. 

K0(S)  = - with  p=SH.R-\s  (18) 

D  -  i-p(sK  -  =  “ 


Noise  estimate  I  jkpiihnod  with  est.  noise  cov. 


Figure  2.  (Left).  Capon  Noise  spectral  density  estimate.  (Right). 
Log-Likelihood 
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(16) 


In  summary,  viewing  the  problem  of  the  noise 
covariance  matrix  estimation  as  a  problem  of  spectral 
subtraction,  carried  over  the  inverse  of  the  data  covariance 
matrix,  reveals  that  the  optimum  frequency  detector  is  not  the 
classical  minimum  variance  beamformer.  It  is  the  so-called 
normalized  Capon  spectral  estimate  which  provides  the  optimum 
test  for  frequency  detection. 


Since  the  power  level  1/p  contains  both  the  signal  and  the  noise 
contributions  and,  in  addition,  we  assume  that  the  noise  power 
can  be  formulated  by  a  white  density  N0  and  a  shaping 
bandwidth  B(S),  then 


—  =as+a0„=as+N0.B(s)  (19) 

P 


In  consequence,  the  product  Ko(S).p  can  be  formulated  as  (20), 
where  \j/(S)  is  the  spectral  density. 


K0(S).p 
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s  _ 


A0i©- 
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(20) 


It  may  be  argued  that  in  the  above  formulation  the 
parameter  y  may  also  depend  on  the  steering  vector,  in  this 
respect,  next  section  will  describe  some  empirical  support  to  the 
choice  of  this  parameter,  setting  it  to  a  constant  value.  Note  that 
this  is  equivalent  to  set  a  constant  value  for  the  trace  of  the 
inverse  noise  covariance  matrix  estimate,  independently  of  the 
steering  selected. 


Furthermore,  using  (20)  in  the  corresponding  log-likelihood, 
results  in  (21),  which  proves  the  efficiency  and  the  relationship 
between  line  detectors  or  spectral  density  estimates  and  the 
maximization  of  the  log-likelihood.  At  the  same  time,  taking  into 
account  that  the  normalized  estimate  provides  spectral  density, 
ensures  that  the  proper  choice  for  parameter  y  in  the  previous 
section  is  a  constant  independent  of  the  steering. 


Before  closing  this  section,  in  Figure  2  they  can  be 
viewed:  Right,  the  likelihood  using  (16);  and,  left,  the  noise 
spectral  estimate  using  (13)  and  the  Capon  power  level  estimate. 


A  - Q  + 1  =  Ln\\  +  K0 (s)p ]  =  In 


(21) 


IV  SPECTRAL  ESTIMATION. 


V.  SIMULATIONS 


Rewriting  again  the  noise  covariance  estimate  leaving 
unknown  the  parameter  K0(§), 


In  order  to  show  that  the  framework  described 
previously  is  also  valid  for  2-D  problems,  the  hexagonal  aperture 
depicted  in  Figure  3  has  been  selected  for  this  section. 
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2D  Hexagonal  aperture 
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Figure  3.  Selected  aperture. 

Periodogram 


Figure  4.  The  noise  Periodogram.  Polar  slowness-azimuth  plot. 

NMLM 


Figure  5.  The  Normalized  estimate,  proportional  to  the  log- 
likelihood,  for  the  scenario  defined  in  the  text. 

The  noise  was  spatially  colored  being  its  Periodogram 
estimate  as  depicted  in  Figure  4.  The  representation  uses  the 
slowness-azimuth  plot  being  the  south  to  north  axis  coincident 
with  the  ordinate  axis  of  the  plot. 


The  source  was  located  at  20°  of  elevation  and  80°  of 
azimuth  with  a  SNR  of  0  dB.  Figure  5  shows  the  normalized 
estimate,  proportional  to  the  likelihood.  The  estimated  location 
of  the  source  is  80.83°  and  20.18°.  It  is  important  to  remark  that 
the  normalized  estimate  (NMLM)  performs  like  a  high 
resolution  procedure  and  its  accurateness  in  the  source  location 
requires  high  density  grid  to  scan  for  the  maximum. 

The  Capon  estimate  for  the  power  level  of  the 
background  noise,  with  the  spectral  subtraction  indicated 
previously  can  be  viewed  in  Figure  6,  where  it  is  evident  the 
similarity  with  the  original  one. 


subtraction  procedure  described  in  the  text. 

VI.  CONCLUSIONS 

It  has  been  shown  what  is  the  relationship  between 
spectral  density  estimators  and  source  detection  in  colored  noise. 
At  the  same  time,  the  interest  of  the  normalized  Capon  estimate 
has  been  proven  to  be  the  natural  1-d  or  2-d  spectral  density 
estimate.  In  fact,  this  density  estimate,  for  any  2-d  scenarios,  is 
superior  to  the  available  procedures  of  Periodogram  (low 
resolution)  and  Music  (high  complexity  and  order  uncertainty). 
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ABSTRACT 

In  a  statistical  signal  processing  parameter  estimation  problem, 
ambiguities  are  generated  when  there  is  not  a  one  to  one  mapping 
between  the  object  space  and  the  measurement  space.  In  this 
paper,  the  ambiguity  problem  is  investigated  using  differential 
geometry  concepts  and  a  theoretical  framework  is  proposed  for 
the  classification,  identification  and  calculation  of  ambiguities, 
associated  with  two  application  areas,  that  is  in  the  array 
processing  and  in  the  harmonic  retrieval  problem. 

1.  INTRODUCTION 

Much  recent  effort  has  been  devoted  to  the  study  of  ambiguities 
which  is  a  well  known  research  problem  in  parameter  estimation 
applications.  However,  existing  investigations  have  been 
restricted  to  the  array  processing  area  and  are  mainly  concerned 
with  four  aspects  of  ambiguities.  The  first  is  the  identification  of 
array  geometries  free  of  ambiguities  of  up  to  a  certain  rank  ([1], 
[2]).  The  second  is  concerned  with  the  classification  and 
estimation  of  manifold  ambiguities  for  both  linear  and  planar 
array  geometries  ([3],  [4]),  while  the  third  studies  the  effects  of 
sensor  failure  on  the  ambiguous  behaviour  of  linear  arrays  [5]. 
The  final  one  is  related  to  resolving  manifold  ambiguities  for  non 
uniform  linear  array  geometries  [6],  with  all  these  four  aspects 
providing  also  four  independent  lines  of  thought. 

In  this  paper,  the  ambiguity  problem  is  investigated  from  the 
prism  of  a  general  parameter  estimation  problem  and  a  novel 
framework  is  proposed.  This  framework  is  appropriate  for  any 
parameter  estimation  problem  provided  that  the  parameter  of 
interest  p  e  is  mapped  into  a  vector  a(p)  G  CN  (i.e.  p  i— >  a(p)), 
with  a(p)  V  p  €  fl  being  a  curve  having  a  hyperhelical  shape.  The 
modelling,  which  is  based  on  differential  geometry  properties  of 
hyperhelical  curves,  is  presented  in  Section  2,  along  with  some 
important  concepts  and  definitions.  In  the  same  section,  by 
partitioning  a  hyperhelical  curve  into  uniform  and  non  uniform 
segments,  two  classes  of  ambiguities  are  identified  and  an 
algorithm  for  estimating  ambiguous  sets  of  parameters  is 
proposed  in  a  general  framework.  Then  in  Sections  3  and  4  the 
application  of  this  general  framework  to  handle  the  array 
processing  and  the  harmonic  retrieval  problems,  respectively,  is 
presented  along  with  two  representative  examples.  Finally,  in 
Section  5  the  paper  is  concluded. 


2.  THEORETICAL  FRAMEWORK  FOR 
THE  ESTIMATION  OF  AMBIGUITIES 


Let  x(t)  be  the  observation  signal  in  a  parameter  estimation 
problem,  where  the  parameter  of  interest  p  e  fi,  is  mapped  into  a 
vector  a(p),  having  the  following  general  form: 


a(p)  =  exp(  -  j7r(vh(p)  +  u))  (1) 

where  v,  u  are  constant  TV-dimensional  real  vectors  and  h(p)  is  a 
real  function  satisfying  the  following  condition: 


dh(p) 

dp 


dh(p) 

dp 


=  f  Vpefi 


=  -f  Vpefi 


(2) 


Furthermore,  if  the  observation  signal  z(t)  can  be  modelled  as  a 
function  of  a(p),  as  follows: 

M 

x(t)  =  ^2  M(Pt)  •  mi (t)  +  n(t)  (3) 

t=i 


or,  in  a  matrix  format, 

x(t)  =  A (p)  •  m(t)  +  n(t)  (4) 

then  the  ambiguity  problem  arises  when  two  or  more  vectors  a(p) 
are  linearly  dependent,  leading  to  a  rank  deficient  matrix  A (p).  In 
Equation  (4),  m(t)  =  [m\(t),  7712(f),  •••>  77iM(f)]r  is  a  vector 
signal,  n(f)  denotes  the  noise  effects,  and  A(p)  =  [a(pi),  a(p2), 
...,  a(PAf)]- 


It  can  be  proven  that  the  locus  of  the  vector  a(p)  given  by 
Equation  (1),  over  the  parameter  space  fi  (Vp  G  0),  is  a  curve  in 
TV-dimensional  complex  space,  having  a  hvperhelical  shape.  The 
advantages  of  having  hyperhelical  curves  are  numerous.  The  most 
important  is  that  their  shape  and  properties  can  be  described  by  a 
set  of  constant  curvatures  which  can  be  analytically  estimated  [7], 


Since  the  locus  of  a(p)  is  a  curve  embedded  in  an  TV-dimensional 
complex  space,  the  arc  length  s  is  a  much  more  natural  way  of 
parametrising  a  curve  as  compared  to  p,  representing  the  actual 
length  of  a  segment  of  the  curve.  If  a(p)  is  described  by  Equation 
(1)  then  the  arc  length  s  is  related  to  the  parameter  of  interest  p 
via  the  following  expression: 

s(p)  =  jr||v||  (h(p)  -  h(0))  (5) 

Note  that  the  total  length  ly  of  the  curve  is  called  the  manifold 
length. 
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Although  the  number  of  ambiguous  sets  is  infinite,  it  can  be 
proven  that  if  one  ambiguous  set  of  arc  lengths  is  identified,  then 
by  simple  rotation,  an  infinite  number  of  ambiguous  sets  can  be 
generated.  For  that  reason  the  concept  of  the  ambiguous 
generator  set  (representing  all  these  rotated  ambiguous  sets)  and 
its  corresponding  rank,  called  rank  of  ambiguity  were  introduced 
in  [3]  as  follows: 

Ambiguous  eenerator  set. 

An  ordered  set  s  —  [0,  Si .  sm-i  ]T  of  M  arc  lengths,  where 

2  <  M  <  N,  is  said  to  be  an  ambiguous  generator  set  of  arc 
lengths  if  and  only  if. 

a)  All  the  elements  of  the  set  but  the  first  element  are  non¬ 
zero, 

b)  The  rank  of  the  N  x  M  matrix  A (s)  with  columns  the 
manifold  vectors  associated  with  the  elements  of  the  set  is 
less  than  M,  i.e.  rank((A(s))  =  p  <  M  and 

c)  For  any  subset  sj  ofk  elements  of  s  with  p  <k<  M,  the 
rank  of  A(sj)  is  equal  to  p. 

Rank  of  Ambiguity:  The  rank  of  the  matrix  A  (s)  is  called  rank  of 
ambiguity  p  of  the  set  s. 

The  ambiguous  generator  sets  are  divided  in  two  different  classes 
according  to  the  way  the  hyperhelical  curve  is  partitioned.  That  is 
the  uniform  class  and  the  non  uniform  class.  However,  the  non 
uniform  class  exists  if  and  only  if  the  vector  v  is  a  symmetric 
vector,  i.e.  if  the  elements  of  v  are  symmetric  with  respect  to  their 
centroid,  i.e.  sum(y  ')  =  0,  Vi  =  odd,  where  y  =  v  +  bl,  with  b 
a  real  number. 

In  the  case  of  the  uniform  class  of  ambiguities,  a  set  of  arc 
lengths  Sy  (that  partitions  the  hyperhelix  uniformly)  can  be 
formed  by  the  following  equation: 

=  9  [  |Vj-Vj|’  |Vf— Vjt  ]  (6) 

where  v„  v,  are  elements  of  the  vector  v  V  i,j,  c  is  an  integer 
number  and 


"  h(pniax)  -  h(0) 

The  existing  sets  of  the  above  form  provide  the  corresponding 
ambiguous  generator  sets,  V  i,  j. 

In  the  case  of  the  non  uniform  class  of  ambiguities,  the 
hyperhelical  curve  is  partitioned  to  a  number  of  non  uniform 
segments  according  to  the  roots  of 

Tr^C  expm{s  C}j  =  0  Vs€[0/v)  (8) 

where  C  is  a  d  x  d  matrix,  known  as  the  Cartan  matrix  and 
defined  as: 


'  0 

-  «i 

0 

0 

0 

«1 

0 

-«2 

0 

0 

CdeJ 

0 

K2 

0 

-  «3 

0 

6 

0 

Kd- 2 

6 

-  Kd- 

0 

0 

0 

lid- 1 

0 

with  Ki  being  the  ith  curvature  [7],  and 

d  —  f  2iV  -  k  if  no  element  of  v  is  at  the  centroid  of  v 
\  2 N  —  k  -  1  otherwise 

with  A;  representing  the  number  of  elements  of  v  which  have 
symmetrical  pairs  with  respect  to  its  centroid. 

Based  on  Equations  (6)  and  (8),  an  algorithm  for  estimating  the 
ambiguous  generator  sets  and  the  associated  rank  of  ambiguity  is 
presented  below  in  a  compact  step  format: 

Estimation  Algorithm: 

STEP  T.  Calculate  the  length  Zv  of  the  hyperhelical  curve  and  the 
vector  v  which  is  the  Kronecker  difference  v©v  with  all  the 
elements  that  are  smaller  than  one  eliminated.  Also,  if  the  vector  v 
is  symmetric,  calculate  the  Cartan  matrix  C. 

STEP  2:  For  each  of  the  elements  of  the  vector  0,  create  the 
corresponding  set  s(j,  by  partitioning  the  hyperhelix  uniformly, 
using  Equation  (6).  These  sets  will  provide  the  ambiguous 
generator  sets  belonging  to  the  uniform  class.  If  the  vector  v  is 
also  symmetric  then  calculate  the  roots  of  Equation  (8)  and  form 
an  extra  set  s,-  •.  This  set  will  provide  the  ambiguous  generator 
sets  belonging  to  the  non  uniform  class. 

STEP  3:  For  each  of  the  sets  formed  in  Steps  2,  identify  the 
ambiguous  generator  sets  based  on  the  following  rules: 

rule-a)  If  sUj  is  unique,  then  ^  J:_1  j  ambiguous  generator 

sets  can  be  produced  which  are  all  the  possible  subsets  of 
N  elements  of  the  set  with  their  first  element  zero  and 
N  —  1  non-zero  elements.  Their  rank  of  ambiguity  is 
equal  to  N  —  1. 

rule-b)  If  is  not  unique,  then  all  subsets  of  sf  j  with 
their  first  element  0  and  with  length  2, 3,  ...up  to 
min(N,k  +  1)  must  be  considered.  These  subsets  are 
classified  as  ambiguous  generator  sets  if  the  three 
conditions  of  the  ambiguous  generator  set  definition  are 
satisfied  [3],  Furthermore,  for  each  ambiguous  generator 
set  Sij,  the  rank  of  ambiguity  is  determined. 

3.  ARRAY  PROCESSING 

In  an  azimuth  direction  finding  system  employing  planar  arrays, 
the  signal  x(t)  is  modelled  as  in  Equation  (3),  with  m(t )  being  a 
baseband  signal-vector  and  n(t)  the  noise  vector.  The  parameters 
of  interest  (6  azimuth,  <f>  elevation)  are  mapped  to  the  array 
manifold  vector  as  follows: 

a (0,(f))  =  exp(  -  jn(rxcos()  +  ry sin^)cos<;f>)  (10) 
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where  [r,,  ry,  0]  denote  the  locations  of  the  sensors  in  half¬ 
wavelengths  and  0  6  [0,27r),  0  €  [0,|). 

If  Equation  (10)  is  compared  with  Equation  (1),  it  can  be  seen 
that  the  0-curves  (0  =  constant)  are  hyperhelical  curves  with 
p  =  0,  v  =  rxcos9  +  rysind  and  u  =  0.  However,  0-curves 
(0  =  constant)  are  not  hyperhelices.  Therefore  ambiguities 
associated  only  with  the  0-curves,  (i.e.  elevation  angles),  can  be 
estimated  by  using  the  proposed  approach.  Furthermore,  based  on 
cone-angle  parametrisation  (a,/3)  [8]  which  is  an  alternative 
parametrisation  to  (0,0),  the  array  manifold  vector  can  be 
rewritten  as  follows: 

a  =  exp^  —  jn(R(@)cosa  +  R(@  +  —  )cos/3)^  (11) 


Vo,/3e  [0°,  180°),  where 


f  R(&)  =  Tj. cos©  +  rysinQ 
[  0  is  the  rotation  of  the  x-y  frame 


matching  Equation  (1)  and  thus  both  a-curves  (J3  =  constant) 
and  /3-curves  (a  =  constant),  are  hyperhelical  curves.  For 
instance,  for  the  a-curves,  p  =  a,  v  =  R{@)  and 
u  =  R(Q  +  |)cos  /3. 


From  the  previous  discussion  and  modelling  it  is  obvious  that 
ambiguities  associated  with  the  0,  a  and  /3-curves  can  be 
estimated  using  the  proposed  approach,  while  the  case  of  constant 
elevation  and  different  azimuths  (0-curves),  remains  an  open 
problem. 


Frnmnle  1:  Consider  a  planar  array  with  sensor  locations  given 
by  the  following  matrix,  in  half-wavelengths: 

'-1.7,  -1.5,  0,  0,  1.5,  1.7 

2.2,  -2.5,  -2,  2,  2.5,  -2.2 

0,  0,  0,  0,  0,  0 


[E»-  tr  °1  = 


-|T 


For  the  0-curve  of  the  array  manifold  corresponding  to  0  =  85° 
(say),  the  manifold  length  /v(85°)  is  equal  to  34.4330.  If  the 
proposed  method  described  in  the  previous  section  is  applied  to 
calculate  the  ambiguous  generator  sets  associated  with  this  0- 
curve,  then  the  following  ambiguous  generator  sets  in  arc  lengths 
are  estimated: 


0, 

6.5681, 

13.1362, 

19.7043, 

26.2725, 

32.8406 

0, 

7.3816, 

14.7633, 

22.1449, 

29.5265, 

0 

0, 

7.4633, 

14.9267, 

22.3900, 

29.8534, 

0 

0, 

8.5318, 

17.0635, 

25.5953, 

34.1271, 

0 

with  corresponding  rank  of  ambiguity  p  =  [5, 4, 4, 4]r. 


Furthermore,  if  the  ambiguous  generator  sets  of  the  above  array 
are  estimated  for  every  0,  then  ambiguous  generator  lines  will  be 
formed.  Figure  1  shows  the  set  of  ambiguous  generator  lines 
associated  with  the  first  row  of  the  matrix  E.  In  the  same  figure 
the  locus  of  the  manifold  lengths  of  every  0-curve  (/v(0),  V0)  is 
also  shown  as  a  dashed  line.  The  intersection  of  a  line  from  the 
origin  with  a  set  of  ambiguous  generator  lines  provides  an 
ambiguous  generator  set  of  directions  and  in  Figure  1  the 


symbols  (•)  show  an  ambiguous  generator  set  for  0  =  85°  (first 
row  of  E). 

0cc  (d®9r@ag^ 


Figure  1:  The  set  of  ambiguous  generator  lines  of  rank-5 
associated  with  the  first  row  of  E. 


4.  HARMONIC  RETRIEVAL  PROBLEM 

Consider  a  signal  x(t)  which  is  a  sum  of  M  complex  sinusoids, 
with  unknown  amplitudes  m  =  [mi, ...,  ...,niM]T  and 

unknown  frequencies  /  =  [/i ,  •••,  /;,  Jm\t  which  are  to  be 
estimated.  The  frequencies  are  normalised  with  respect  to  a 
known  maximum  frequency  fs  which  implies  that: 

0</i<l  V  i  (12) 

The  signal  is  assumed  to  be  contaminated  with  additive  white 
Gaussian  noise,  n(t). 

Let  us  also  consider  that  over  an  observation  interval  T0bS,  the 
signal  x(t)  is  sampled  at  a  non-uniform  rate,  with  the  number  of 
samples,  N,  satisfying  the  following  condition: 

M  <  N  <  [^J  (13) 

where  Ts  is  defined  as  1  /fs.  This  implies  that  the  signal  x(t)  is 
sampled  at  times  ,tjv  £  R,  which  are  normalised  with 

respect  to  Ts.  By  defining  the  N  x  1  vector 
t  =  [ti,...,tN]r  €  Rn,  which  for  uniform  sampling  at  the 
Nyquist  rate  becomes  [1,2, ...,  N]T,  the  data  sequence  over  the 
observation  interval  T0bs,  can  be  modelled  as  follows: 

M 

x  =  y^m,exp(j27rf/,)  +n  (14) 

i=l 

or  in  matrix  format  x  =  A(f)m  +  n  (15) 

where  A(/)  =  expO'27rf/r)  is  an  N  x  M  matrix  and  exp(.) 
denotes  element  by  element  exponential. 

Let  us  now  define  as  the  frequency  manifold  vector  the  following 
N  x  1  vector: 

a(/)  =  epV  V  /  €  [0, 1)  (16) 

This  vector  is  of  the  form  of  Equation  (1)  with  p  =  /,  v  =  t  and 
u  =  0,  and  its  locus  is  a  hyperhelical  curve.  Thus  the  proposed 
algorithm  of  Section  2,  can  be  employed  to  estimate  the  sets  of 
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ambiguous  generator  frequencies,  as  can  be  seen  by  the  following 
example. 

Example  2:  Consider  a  signal  which  is  the  sum  of  three  sinusoids 
of  normalised  unknown  frequencies  0.3632,  0.6263  and  0.8895, 
plus  additive  white  Gaussian  noise.  If  the  signal  is  sampled  over  a 
normalised  observation  interval  at  times  t  =  [0,  1.3,  2.5,  3.8]T, 
then  three  ambiguous  generator  sets  of  frequencies  (and  their 
associated  rank  of  ambiguity)  can  be  found  by  the  proposed 
algorithm.  These  are: 


'0, 

0.2632, 

0.5263, 

0.7895 ' 

3 

F  = 

0, 

0.4000, 

0.8000, 

0 

2 

0, 

0.2868 

0.5008 

0.7934 

3 

with  the  manifold  length  of  the  associated  hyperhelix  being  equal 
to: 

/v  =  2ir\\t\\  =  29.7242  (18) 

Note  that  because  the  vector  t  is  a  symmetric  vector,  (i.e.  b  =  -1.9 
and  sum (  t  )  =  0,  Vi  =  odd),  both  classes  of  ambiguities  exist 
and  in  particular  the  first  two  rows  (sets)  of  F  belong  to  the 
'uniform'  class  while  the  last  row  of  F  to  the  'non-uniform'  class  of 
ambiguities.  If  the  MUSIC  algorithm  is  used  to  estimate  the 
unknown  frequencies,  four  frequencies  will  be  provided  rather 
than  three  and  this  is  illustrated  in  Figure  2.  It  is  clear  that  the 
estimated  frequencies  are  /  =  [0.1,  0.3632,  0.6263,  0.8895].  It  is 
obvious  that  by  subtracting  0.1  from  the  elements  of  /  the  first 
row  of  the  matrix  F  is  obtained,  indicating  that  the  estimated 
frequencies  are  'ambiguous'. 


Figure  2:  MUSIC  spectrum  for  the  set  of  samples  of  the  Example 
2.  The  true  frequencies  are  [0.3632,  0.6263,  0.8895]  and  the 
frequencies  estimated  by  MUSIC  are  [0.1,  0.3632,  0.6263, 
0.8895], 

From  the  above  example  it  can  be  seen  that  if  the  unknown 
frequencies  do  not  correspond  to  any  of  the  ambiguous  generator 
sets  (rows  of  matrix  F),  then  they  can  be  estimated 
unambiguously,  even  if  the  sampling  rate  is  lower  than  Nyquist.  It 
is  also  obvious  that  by  using  any  set  of  N  non-uniform  samples 


which  provides  a  number  of  ambiguous  generator  sets  with 
minimum  rank  of  ambiguity  p,  we  can  unambiguously  resolve  any 
p  —  1  frequencies.  Note  that  the  maximum  number  of 
unambiguously  estimated  frequencies  for  a  set  of  N  samples  is 
achieved  when  the  minimum  rank  of  ambiguity  p  is  equal  to 
N-  1. 

5.  CONCLUSIONS 

In  this  paper,  the  ambiguity  problem  has  been  investigated  and  a 
generalised  framework  has  been  proposed  for  calculating  the 
ambiguous  sets  of  parameters,  based  on  the  hyperhelical 
parametrisation  of  the  manifold  vectors.  The  proposed  framework 
was  supported  by  two  representative  examples,  one  associated 
with  the  harmonic  retrieval  problem  and  the  other  with  the  array 
processing  area. 
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ABSTRACT 


ACF  through  the  Wiener-Khinchine  theorem  as 


We  have  examined  the  bias  and  variance  properties  of 
a  recently  suggested  class  of  multiwindow  estimators 
for  autocorrelation  functions  (ACF).  The  derived  ex¬ 
act  expression  for  the  bias  is  valid  for  any  amplitude 
distribution,  while  the  derived  exact  result  for  the  vari¬ 
ance  is  valid  for  zero-mean  Gaussian  processes.  We 
show  that  the  multiwindow  ACF  estimator  has  unde¬ 
sirable  bias  properties  and  inferior  variance  properties 
compared  to  the  standard  ACF  estimator.  The  rea¬ 
son  is  that  the  correlation  properties  of  the  windows 
contribute  directly  to  the  ACF  estimator  and  its  sta¬ 
tistical  moments.  The  lesson  to  be  learned  is  that  what 
is  good  for  spectral  estimators  is  not  necessarily  good 
for  correlation  estimators. 

1.  INTRODUCTION 

Applications  often  require  that  accurate  estimates  of 
the  Autocorrelation  Function  (ACF)  of  some  under¬ 
lying  stochastic  process  is  known.  It  is  in  general  a 
difficult  task  to  estimate  the  ACF  from  data,  because 
often  only  short  sampled  data  segments  are  available. 
This  introduces  a  certain  estimation  bias  and  variance 
in  the  estimates.  A  natural  goal  is  to  reduce  the  bias 
and/or  variance  as  much  as  possible. 

In  this  paper,  we  examine  the  bias  and  variance  prop¬ 
erties  of  a  recently  suggested  class  of  ACF  estimators 
[2,  3]  that  is  based  on  Thomson’s  multiwindow  spectral 
estimators  [7].  The  statistical  properties  of  the  novel 
estimators  will  be  compared  to  those  of  a  classical  ACF 
estimator. 

The  basic  definition  of  an  ACF  for  a  wide  sense  sta¬ 
tionary  stochastic  process  X(t)  is  (e.g.,  Ref.  [4]) 

Rxx(r)=E[X(t)X(t  +  T)},  (1) 

where  £?[•]  is  the  expectation  operator,  and  r  is  the 
time  lag. 

The  Power  Spectral  Density  (PSD)  Sxx(f)  of  a 
wide-sense  stationary  process  X(t)  is  related  to  the 


Sxx(f)=F{Rxx(r)}, 
where  T7  {•}  is  the  Fourier  transform. 


(2) 


2.  MULTI  WINDOW  PSD  ESTIMATORS 

The  fairly  recent  multiwindow  (MW)  non-parametric 
estimator  for  power  spectral  densities  (PSD)  [7,  4,  5] 
can  be  seen  as  a  variation  or  an  extension  of  the  win¬ 
dowed  periodogram  technique.  In  this  method,  one 
applies  a  sequence  of  orthogonal  data  windows  that 
obey  some  optimality  criterion,  to  form  a  sequence  of 
direct  windowed  PSD  estimates.  The  windowing  re¬ 
duces  the  spectral  leakage,  as  is  well-known  from  classi¬ 
cal  spectral  estimation.  By  forming  a  weighted  average 
of  the  individual  spectral  estimates,  we  are  simultane¬ 
ously  able  to  reduce  the  estimation  variance. 


2.1.  Discrete  Prolate  Spheroidal  Sequences 

Thomson  [1982]  proposed  to  apply  some  stringent  op¬ 
timality  criteria  when  selecting  data  tapers.  He  sug¬ 
gested  to  consider  tapers  that  maximizes  the  “spectral 
concentration”,  or  the  energy  contained  in  the  mainlobe 
relative  to  the  total  energy  of  the  taper.  One  therefore 
seeks  the  taper  ?;[n]  with  a  discrete  Fourier  transform 
V{f),  that  maximizes  the  window  energy  ratio 


/b 

A  =  J  \V(f)\2df 

—  fB 


(3) 


where  /b  is  the  wanted  resolution  half-bandwidth  (a 
design  parameter)  of  the  taper.  An  ideal  taper  would 
therefore  have  A  ~  1  and  fs  as  small  as  possible  (but 
note  that  /b  >  1/A).  (Note  also  that  we  use  At  =  1 
in  this  chapter  to  simplify  the  notation.) 

Expressing  V(f)  by  its  discrete  Fourier  trans¬ 
form,  V(f)  =  exP(-/27r/n)  and  maxi¬ 

mizing  the  above  functional  with  respect  to  u[n], 
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Slepian  [1978]  showed  that  the  optimal  taper  v  =  Assume  that  the  data  available  from  a  single  realiza¬ 

tion  v[l], . . .  ,v[N  -  1]]T  obeys  the  eigenvalue  equation  tion  x(t)  of  a  process  X(t)  are 


Av  =  Av  (4) 

where  the  matrix  A  has  elements  [A]  = 

sin[27r/B(n  -  m)]/[7r(n  -  m)],  for  n,m  =  0, 1, . . . ,  N  - 
1.  Note  that  (4)  is  an  iV-dimensional  eigen¬ 
vector/eigenvalue  problem,  thus  giving  N  eigenvec¬ 
tor/eigenvalue  pairs,  (vfe,  A*,),  where  k  =  0, 1, . . . , N  - 
1.  The  interpretation  is  thus  that  we  obtain  a  sequence 
of  orthogonal  tapers  (eigenvectors),  v*,  each  with  a 
corresponding  spectral  concentration  measure  A*, .  The 
first  taper  v0  has  a  spectral  concentration  A0.  Then, 
vj  maximizes  the  ratio  in  (3)  subject  to  being  orthog¬ 
onal  to  v0,  and  with  Ai  <  A0.  Continuing,  we  can  thus 
form  up  to  N  orthogonal  tapers  v0,  Vi, . . . ,  vw_l5  with 
0  <  Atv-i  <  Aat-2  <  •  ■  •  <  Ao  <  1.  Only  tapers  with 
A*  ~  1  can  be  applied,  since  A*.  -C  1  implies  a  large  un¬ 
desirable  leakage.  It  is  usually  safe  to  apply  K  —  2 N/b 
tapers  [Percival  and  Walden ,  1993,  pp.  334-335], 

There  are  many  ways  to  form  weighted  averages  over 
the  windowed  data.  We  may  therefore  write  a  general 
MW  PSD  estimate  as 


K- 1 

SmtU)  ~  q*^mt(/) 


k= 0 


(5) 


where  the  “eigenspectrum”  of  order  k  is  defined  by 


N-l 

yi  vk[n]x[n]  exp(-j27r/n) 
n=0 


I/I  <  1/2 


(6) 

where  vk  [n]  denotes  the  elements  of  DPSS-taper  of  or¬ 
der  k,  and  ak  is  a  weight  factor  for  eigenspectrum  no. 
k. 


The  three  “standard”  weight  coefficients  are  (i)  Uni¬ 
form  weighting,  ak  =  1  /K,  k  =  0,...,K-l,  (ii)  Eigen¬ 
value  weighting,  ak  =  ^k/Ylk=o  Afc>  and  (hi)  Inverse 
eigenvalue  weighting,  ak  =  1/Xk  (jTkJo  A*). 


3.  MULTIWINDOW  ACF  ESTIMATORS 

The  rationale  behind  the  MW-ACF  estimator  is  the 
following.  If  a  spectral  estimate  has  been  derived  from 
a  high-quality  estimator  like  the  MW  spectral  estima¬ 
tor,  a  direct  application  of  Wiener-Khinchine’s  theorem 
should  produce  a  high-quality  ACF  estimator.  Thus, 
the  MW-ACF  estimator  is  given  by 

Rmw[ m]  =  T~l  {Smw(/)}  (7) 

where  T  denotes  the  Fourier  transform,  a  hat  denotes 
an  estimator,  and  subscript  MW  denotes  multwindow. 


x[n]  =  x(nAt)  ;  n  =  0, 1, . . . ,  N  -  1  (8) 

where  At  is  the  sampling  interval. 

It  is  easy  to  show  that  Eq.  (7)  leads  to  an  estimator 
of  the  form  [2,  3] 


1  k-i  ^ 

i?Miy[m]  =  —  (9) 

fc=0 


where 

N—  1  —  |m| 

R(mw  M  =  «*[”]«*[*»  +  |«*|]*[n]®[n  +  |m|] 

n=0 

(10) 

Here,  x[n]  is  the  datum  at  time  step  n,  vk[n ];  n  = 
0, 1, ..,  N  -  1  are  the  components  of  taper  no.  k,  and  K 
is  the  number  of  tapers  applied  in  the  formation  of  the 
ACF-estimate.  Usually,  one  chooses  K  <C  N  to  avoid 
excessive  leakage  from  the  tapers. 

The  tapers  vk[n ]  that  maximizes  the  energy  con¬ 
tained  in  the  main  lobe,  subject  to  a  designer  specified 
half-bandwidth,  are  the  so-called  Slepian  sequences,  or 
Discrete  Prolate  Spheroidal  Sequences  (DPSS)  [6,  7]. 
These  tapers  cannot  be  written  in  a  closed  form,  but 
are  rather  defined  as  a  solution  of  an  eigenvalue,  eigen¬ 
vector  problem  [6,  7,  4],  Recently,  a  simpler  set  of  or¬ 
thonormal  tapers  were  introduced  by  [5].  These  tapers 
are  commonly  referred  to  as  “sinusoidal  tapers”  due  to 
their  mathematical  definition.  The  sinusoidal  tapers 
are  approximations  to  tapers  that  minimize  the  local 
bias,  subject  to  being  orthonormal  in  sample  space. 

Note  that  the  classical  biased  ACF-estimator  (the 
“standard”  ACF-estimator  [4])  is  derived  from  (9)  sim¬ 
ply  by  choosing  K  =  1  and  n0[n]  =  1/y/N;  n  = 
0, 1, N  -  1. 


3.1.  Expectation  Value 

It  is  straightforward  to  evaluate  the  expectation  value 
of  the  multiwindow  ACF  estimator.  We  found  that 

E  |  Rmw  [m 
Rxx  [m] 


1} 


where 


K- 1 

Q[m]  =  ^2  akPk[m] 

k= 0 


where  pk[m]  =  J Zn=o  ^  vk[n]vk[n  -I-  |m|]  is  the  de¬ 
terministic  correlation  function  for  data  window  no.  k. 


392 


WINDOW  CORRELATION  FUNCTIONS 


Figure  1:  The  window  correlation  autocorrelatin  pk  [m] 
for  window  orders  k  =  0:  short  dashes  ;  k  =  1:  dash- 
dot  ;  k  =  2:  dash-dot-dot-dot ;  and  k  =  3:  long  dashes. 

Thus,  the  expectation  value  of  the  MW-ACF  estima¬ 
tor  is  governed  by  a  weighted  average  of  the  correlation 
functions  of  the  K  individual  taper  sequences.  Note 
that  this  result  is  exact,  and  that  no  assumptions  were 
made  about  the  amplitude  distribution. 

The  result  in  (11)  is  very  important,  because  it  shows 
that  any  of  the  three  standard  weightings  will  cause  se¬ 
vere  problems  for  the  estimator.  For  K  >  1,  we  will 
end  up  with  lag-regions  where  Q  [m]  <  0,  and  the  es¬ 
timate  will  in  general  be  severely  biased.  It  is  easy 
to  understand  why  this  is  so  by  examining  the  win¬ 
dow  correlation  functions  order  by  order.  In  Fig.  1  we 
show  the  four  lowest  order  window  correlation  functions 
pk[m ]  for  k  —  0, 1, 2, 3  for  DPSS  windows  with  TV  =  50 
and  TV/b  =  3.  We  see  that  all  correlation  functions 
are  decaying,  and  that  the  higher  the  order,  the  more 
oscillations  is  evident. 

In  Fig.  2  we  show  the  quantity  Q[m }  (see  Eq.  (11)), 
which  is  the  exectation  value  normalized  by  the  true 
ACF  for  each  lag.  We  have  shown  Q[m ]  for  K  =  1,2, 3, 
and  4  tapers  using  a  uniform  weighting  of  the  individual 
windowed  ACF  estimates.  The  full  line  is  the  result  for 
the  classical  (biased)  ACF-estimator  for  comparison, 
the  short-dashed  curve  is  the  MW-ACF  case  for  K  = 
1,  the  dash-dot  curve  is  K  =  2,  the  dash-dot-dot-dot 
curve  is  K  =  3,  and  the  long-dashed  curve  is  K  = 
4.  Note  that  Q[m]  with  uniform  weighting  in  general 


Figure  2:  The  function  Q[m\.  Classical  ACF-estimator: 
full  line;  MW-ACF  with  K  —  1:  short  dashes  ;  K  =  2: 
dash-dot  ;  K  —  3:  dash-dot-dot-dot  ;  and  K  =  4:  long 
dashes. 

exhibits  K  —  1  zeros,  where  K  is  the  number  of  tapers. 

It  is  important  to  notice  that  the  MW-ACF  esti¬ 
mators  in  general  introduce  a  significant  bias.  When 
K  >  1,  we  see  that  there  exist  lag-ranges  where  the  ex¬ 
pected  value  of  the  estimator  has  an  incorrect  sign.  In 
general,  the  range  of  lags  where  one  have  some  degree 
of  confidence  in  the  expectation  value  of  the  MW-ACF 
estimator,  diminishes  as  K  increases. 

3.2.  Variance 

In  general,  it  is  impossible  to  evaluate  the  variance  of 
the  MW-ACF  estimator.  This  is  because  the  result  will 
depend  explicitly  on  the  probability  density  function 
of  the  process  amplitude.  By  making  certain  standard 
assumptions,  however,  we  are  able  to  derive  some  ex¬ 
pressions  that  shed  some  light  on  the  variability  of  the 
MW-ACF  estimator. 

By  assuming  the  process  to  be  a  zero-mean  real¬ 
valued  Gaussian  stochastic  process,  it  is  possible  to 
show  that 

K-lK-l 

var|i?MwM|  =  ^  y~^  akai  { F(k,l;  m)  +  G(k,  l\ m)} 
k= o  (=o 

(12) 
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where 


4.  NUMERICAL  EXAMPLES 


N— 1  —  |m|  TV — 1  —  1 7Ti | 

F(k,l;m)  =  ^ 

n= 0  n'=0 

+  |m|]^[n']w;[n'  +  \m\]R\x{ri  -  n] 

and 

AT— 1  —  |m|  N— 1  — |mj 

G(k,l]m)  =  E  E  +  |m|]x 

n= 0  n#=0 

u; [n']u([n'  +  \m\]RXx[n'  -n-  \m\]Rxx[n'  -n  +  |m|] 

The  variance  of  the  MW-ACF  estimator  is  thus  gov¬ 
erned  by  the  fourth-order  properties  of  the  taper  se¬ 
quences,  but  it  also  depends  explicitly  on  the  fourth- 
order  properties  of  the  true  ACF  of  the  process.  Eq. 
(12)  may  of  course  be  evaluated  numerically  for  a  given 
true  ACF,  but  as  it  stands,  this  expression  may  seem 
of  little  use. 

To  simplify  further,  we  now  assume  that  the  process 
is  white  and  has  a  variance  a\.  Under  these  assump¬ 
tions,  we  find  that  the  variance  has  the  form 

var  |  Rm  w  M  }  ^  !iz} 

- ~4 - =  X  X  oikaipk^[m]  ;  m^O 

(7  \r 

*  k=0  1=0 

(13) 

where 

JV— 1  — |ot| 

pk,i[m]  =  X  Vk[n}vi[n]vk[n  +  \m\]vt[n  +  |m|] 
n=0 

is  a  fourth  order  window  correlation  function  involving 
the  windows  at  two  different  orders  k  and  l. 

3.3.  Related  Quadratic  Error  Measures 

A  related  important  quadratic  error  measures  that 
combines  the  bias  and  variance  is  the  mean-squared 
error  (MSE) 

mse  |rMvvM}  =  var|pMW.[m]|  +  B2  {.RmivM} 

(14) 

where  the  bias  is  B  =  [Q[m]  -  1)  RXx[m]. 

The  cumulative  MSE  up  to  lag  m 

m 

cmse|flMw[ra]j  =  X  mse  {Rmw[1]}  •  (15) 

1=0 

may  also  be  used  to  quantify  the  performance  of  the 
estimator. 


In  the  following  numerical  examples,  we  have  applied 
the  Discrete  Prolate  Spheroidal  Sequences  (DPSS)  [6] 
as  data  windows.  These  windows  maximize  the  window 
energy  in  the  main  lobe  whose  bandwidth  is  a  user 
specified  parameter.  The  DPSS’s  are  not  expressible 
on  closed  form,  rather,  they  are  the  solution  of  the 
eigenproblem 

Av  =  Av  (16) 

where  the  elements  of  the  matrix  A  are  given  by  Amn  = 
sin[27r /s(m  -  n)]/[7r(m  -  n)],  m,n  =  0, 1, . . . ,  TV  -  1, 
and  /b  is  the  desired  resolution  half-bandwidth  of  the 
tapers.  All  examples  shown  are  for  data  sets  of  length 
TV  =  50,  and  a  bandwidth  parameter  of  /b  =  3/TV. 

4.1.  Autoregressive  process  of  order  one 

Autoregressive  processes  of  order  one  (AR(1))  are 
Gaussian,  and  have  an  ACF  given  by 

FMH  =  -^I(-a1)W  (17) 

1  —  Cl  i 

where  a\  is  the  AR-parameter,  and  a2  is  the  variance  of 
the  driving  zero-mean  Gaussian  noise.  In  the  example 
to  follow,  we  have  chosen  the  parameters  a\  =  —0.5 
and  a2  =  1. 

In  Fig.  2  we  show  the  exact  bias,  variance,  mean- 
squared  error,  and  cumulative  MSE  for  four  different 
MW-ACF  estimators,  compared  to  the  exact  results  for 
the  classical  ACF-estimator.  The  different  line-styles 
has  the  same  meaning  as  in  Fig.  1. 

We  see  that  for  K  =  1,2,3,  the  peak  value  of  the 
bias  is  lower  for  the  MW-ACF  than  it  is  for  the  clas¬ 
sical  ACF  estimator,  whereas  K  =  4  has  a  maximum 
bias  that  is  larger  than  that  of  the  classical  estimator. 
It  is  very  important  to  notice  that  the  classical  ACF 
estimator  has  a  variance,  MSE,  and  cumulative  MSE 
that  is  lower  than  the  those  of  the  MW-ACF  estima¬ 
tors,  for  the  small  time  lags.  Beyond  some  crossover 
lag,  however,  the  variance,  MSE,  and  cumulative  MSE 
of  the  MW-ACF  decreases  drastically,  and  stays  far 
below  the  corresponding  values  for  the  classical  ACF 
estimator. 

In  [3]  they  estimated  the  cumulative  MSE  for  AR(2) 
and  MA(2)-data  by  means  of  a  Monte  Carlo  simulation. 
Their  simulation  results  are  consistent  with  our  exact 
results  for  the  MW-ACF  estimator.  They  did  however 
not  compare  their  results  to  that  of  the  classical  ACF 
estimator.  We  have  found  that  also  for  their  examples 
will  the  classical  ACF  estimator  outperform  the  MW- 
ACF  estimator  for  small  time  lags. 
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Figure  3:  Exact  bias,  variance,  MSE,  and  cumulative 
MSE  for  MW-ACF  estimators  and  the  classical  ACF 
estimator  for  an  AR(1) -process.  Line-styles  as  in  Fig. 
1. 

5.  CONCLUSION 

We  derived  an  exact  expression  for  the  expectation 
value  of  the  multiwindow  ACF  (MW-ACF)  estimator 
valid  for  any  amplitude  distribution.  Furthermore,  we 
found  two  useful  approximations  for  the  estimator  vari¬ 
ance  in  the  case  of  zero-mean  Gaussian  data,  one  for 
colored  processes,  and  one  for  white  noise.  By  com¬ 
paring  the  bias,  variance,  mean-squared  error  (MSE), 
and  cumulative  MSE  of  the  MW-ACF  estimators  with 
those  of  the  classical  estimator,  we  see  that  there  is  a 
trade-off  between  the  variance  reduction  and  the  bias 
introduced  by  the  tapering.  For  small  time  lags,  the 
MW-ACF  always  exhibits  a  larger  variance,  MSE  and 
cumulative  MSE  than  that  of  the  classical  estimator. 
For  the  larger  lags,  the  MW-ACF  has  a  far  less  vari¬ 
ance,  MSE  and  cumulative  MSE,  but  the  expected 
value  of  the  estimator  can  in  turn  be  unacceptably 
large.  In  general,  the  MW-ACF  estimator  even  induces 
an  incorrect  sign  of  the  expected  value  for  certain  time- 


lag  intervals. 

It  is  evident  that  only  for  very  limited  lag-ranges  will 
the  MW-ACF  estimator  be  able  to  outperform  the  clas¬ 
sical  ACF  estimator.  We  must  therefore  conclude  that 
the  statistical  properties  of  the  MW-ACF  estimator  are 
such  that  this  estimator  will  be  of  limited  importance 
for  solving  real  world  ACF  estimation  problems. 

The  reason  for  this  peculiar  behavior  is  that  the  cor¬ 
relation  properties  of  the  multiple  windows  becomes 
important  for  all  moments  of  the  MW-ACF  estimator. 

The  lesson  to  be  learned  from  this  is  that  what  is 
good  for  spectral  estimation  is  not  necessarily  so  for 
a  correlation  estimation  -  despite  the  existence  of  the 
Wiener-Khinchine  theorem. 
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ABSTRACT 

This  paper  considers  the  problem  of  estimat¬ 
ing  the  parameters  of  complex- valued  sinusoidal 
signals  observed  in  colored  noise.  This  prob¬ 
lem  is  a  special  case  of  the  general  problem  of 
estimating  the  parameters  of  a  complex-valued 
homogeneous  random  field  with  mixed  spectral 
distribution  from  a  single  observed  realization 
of  it.  The  large  sample  properties  of  the  least 
squares  estimator  of  the  exponentials’  param¬ 
eters  are  derived,  making  no  assumptions  as 
to  the  probability  distribution  of  the  observed 
field.  It  is  shown  that  the  least  squares  es¬ 
timator  is  asymptotically  unbiased.  A  simple 
expression  for  the  estimator  asymptotic  covari¬ 
ance  matrix  is  derived.  The  derivation  shows 
that,  asymptotically,  the  least  squares  estima¬ 
tion  of  the  parameters  of  each  exponential  is 
decoupled  from  the  estimation  of  the  parame¬ 
ters  of  the  other  exponentials.  Assuming  the 
observed  field  is  a  realization  of  a  Gaussian  ran¬ 
dom  field,  it  is  further  demonstrated  that  the 
asymptotic  error  covariance  matrix  of  the  least 
squares  estimate  attains  the  Cramer-Rao  bound, 
even  for  modest  dimensions  of  the  observed  field 
and  low  signal  to  noise  ratios. 

1.  INTRODUCTION 

From  the  2-D  Wold-like  decomposition  we  have  that 
any  2-D  regular  and  homogeneous  discrete  random  field 
can  be  represented  as  a  sum  of  two  mutually  orthog¬ 
onal  components:  a  purely-indeterministic  field  and  a 
deterministic  one.  The  purely-indeterministic  compo¬ 
nent  has  a  unique  white  innovations  driven  moving  av¬ 
erage  representation.  The  deterministic  component  is 
further  orthogonally  decomposed  into  a  harmonic  field 

This  work  was  supported  in  part  by  the  Israel  Ministry  of 
Science  under  Grant  1233198. 


and  a  countable  number  of  mutually  orthogonal  evanes¬ 
cent  fields.  In  this  paper  we  consider  the  problem  of 
least  squares  estimation  of  the  parameters  of  the  har¬ 
monic  component  of  the  field  in  the  presence  of  the 
purely-indeterministic  component.  More  specifically, 
using  the  results  of  [2],  [3]  we  evaluate  the  asymptotic 
error  covariance  matrix  of  the  least  squares  estimator 
of  the  harmonic  field  parameters,  from  noisy  observa¬ 
tions  of  the  field.  The  colored  observation  noise  is  due 
to  the  purely-indeterministic  component.  This  deriva¬ 
tion  makes  no  assumptions  regarding  the  probability 
distribution  of  the  observed  field. 

2.  PROBLEM  DEFINITION 

Let  (y(m,n)},  (m,n)  6  U  where  U  =  {(*,  j)|0  <  i  < 
^4-l,0<j<iV-l}be  the  observed  2-D  complex 
valued  random  field  such  that 

y(m,  n)  =  h(m,  n)  +  e(m,  n)  (1) 

and 

p 

h(m,n)  —  ^2apel('u,pm+,/pn+'p’’l.  (2) 

p=i 

Let  0  denote  the  parameter  vector  of  the  harmonic 
field,  i.e., 

6  —  [  ai  tpi  u>i  iq  •••  ap  <pp  u>p  vp  ]T 

(3) 

where  a*,  >0;  </?*,,  u>k,  Vk  €  [— tt,  7t)  ;  uq  ^  loj  and  vp  ^ 
vq  for  k^j,p^q. 

Assumption  1:  The  purely-indeterministic  com¬ 
ponent  {e(m, n)}  is  a  circular,  zero  mean,  wide  sense 
homogeneous  field,  with  a  positive  and  piecewise  con¬ 
tinuous  spectral  density  <f>(w,  v),  such  that  the  possi¬ 
ble  discontinuities  of  <f>(u),  v)  do  not  coincide  with  any 

{(wp,  i 'p)}p=1. 

Assumption  2:  The  number  P  of  harmonic  com¬ 
ponents  is  a-priori  known. 
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Let  y,  h,  e  denote  the  observation,  harmonic  com¬ 
ponent,  and  purely-indeterministic  component  column 
vectors,  respectively,  where 

y  =  [2/(0, 0),  ■  •  -  ,y{M  —  1, 0),  2/(0, 1), . . . ,  y(M  -  1, 1), 

•  •  • ,  2/(0,  N  —  1),...,  y(M  -1,N  -  1)]T , 

(4) 

and  h,  e  are  similarly  defined.  Let  ,  denote  the  co- 
variance  matrix  of  e  and  hence  of  y  as  well. 


3.  THE  REGRESSION  SPECTRUM 

Define  the  4 P  x  4P  normalization  matrix 

Dm, at  =  diagjD,  D, . .  ,  D}  ,  (5) 

D  —  diag{(M!V)1/2  (MN)l/2  (M3N)1/2  (MiV3)1/2} 

'  (6) 
Define  also  the  mean  gradient  vector  with  respect  to 
the  parameter  vector  0 

dh(m,  n) 


#(m,  n)  = 


do 


(7) 


and  let 


ah 

dOT 


*T(0,0) 

*T(1,0) 


$  (M  —  1,0) 


(8) 


L  *  (M  —  1,  IV  —  1)  J 

Next,  consider  the  sequence  of  matrices 


T 3n,m  _ 

Kfc,/ 


M  1  Y  1  __ 

D  M,N  EE  $(m  +  /^n-M)$”(m,n)DM1JV  (9) 

m=0  n=0 

Since 


■  H , 


N-l  , 

lim  V  nkeipn  =  \  *+i’ 

n~¥oo  Nk+1  "  1  0, 

n=0  v  ’ 


fc+l’  p  — o 
P  7^0 


(10) 


it  can  be  shown  using  some  straightforward  arithmetic 
that  as  M  and  N  tend  to  infinity  the  sequence  Rfc  y 
tends  to  a  limit  given  by 


Rfc,/  -  diag({e^Wpfc+I/p^Bp}p_1) 


where 


Bp  = 


ta 


IQCp  IQtp 

p  3  3 


iap  a„  ,  o 

iav  4  «» 


2  3 

iap 

2  2 


4  3 


(11) 


(12) 


and 


R°’°  =  M,/v“ 

=  diag({Bp}p=1)  (13) 

Note  that  in  the  terms  of  [1],  [3],  Rfc,/  is  a  regression 
correlation  matrix. 

It  can  be  shown,  [3],  that  R k,i  is  a  double  index 
positive  semi-definite  sequence.  We  therefore  conclude 
using  the  theorem  of  Herglotz,  Buchner  and  Weil  and 
following  similar  arguments  to  those  in  [1]  p.  45,  that 
Rfc,z  has  a  spectral  representation  of  the  form 

Rfc,/  =  ei(feU'+'‘/)dM(w’ v)  <14) 

where  M(w,  v)  is  a  matrix  valued  function  of  u)  and 
v  taking  as  values  Hermitian  4 P  x  4P  positive  semi- 
definite  matrices  whose  elements  are  functions  of  bounded 
variation,  while  the  functions  on  the  diagonal  are  non¬ 
decreasing.  For  convenience,  we  define  the  regression 
“spectral  density”,  m(ii>, //), 

m(w,i/)  =  47r2diag({<5Wp,„p(n;,^)Bp}^=1)  (15) 

where  SUptVp(uJ,  v)  denotes  the  Dirac  measure  concen¬ 
trated  on  <jjp,vp. 


4.  LEAST  SQUARES  ESTIMATION  OF 
THE  EXPONENTIALS  PARAMETERS 


Let 

fL(0)  =  7jly-h(0)]"ly-h(0)}  (16) 

be  the  quadratic  objective  function  to  be  minimized 
with  respect  to  the  parameter  vector  0. 

Assuming  ft  is  sufficiently  smooth,  and  employing 
a  second  order  Taylor  series  expansion  (see  [3]  for  the 
details)  we  obtain 


0-O^H 


-l&fL 

o  do 


(17) 


where  Hg  is  the  Hessian  matrix  evaluated  at  0.  Using 
(17),  it  is  shown  in  [3]  that  0  is  an  asymptotically  un¬ 
biased  estimate  of  0.  The  normalized  asymptotic  error 
covariance  matrix  is  then  given  using  (17)  by 


covO  =  i  [5R  (Ro,o)]  1  • 

(  lim  *  )d^}[5J(R0,o)]”1(18) 

where  we  have  used  the  symmetry  of  3?  (Ro,o)>  the  ex¬ 
istence  of  it  inverse,  and  the  circularity  of  {e(m,n)}. 
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In  [3]  we  show  that  under  the  conditions  of  Assump¬ 
tion  1 


liF1  m\n 


M,7V— >oo 


K  [  [  4>(u,v)d M(w,v)  (19) 

"  J  —7T  J  —7 T 


47T2 

Substituting  (19)  into  (18)  we  have 


cov  9  =  -  [5ft  (R0,o)] 


-l 


47T2 


5 ft 


7T  7r 

J  J  4>(uj,  i/)rfM(w,  v) 


[5ft(R0,0)]-1.  (20) 


Substituting  (13)  and  (15)  into  (20)  we  have 


cov  6  =  diag({Cp}p=1)  (21) 


where 


Cp  = 

’  <t>(vp>vp)  0 

0  LMup'Vp) 

i  OJ 

o  0  6(/>(u)p  t/p) 

o  6^>(fajp,i/p) 


6^(qjp,i/p) 

“p 

12^(wp,i/p) 

0 


0 

60(u)p,t/p) 

"p 

0 


12i/i(^p,i/p) 


(22) 


We  therefore  conclude  that,  asymptotically,  the  least 
squares  estimation  of  the  parameters  of  each  exponen¬ 
tial  is  decoupled  from  the  estimation  of  the  parameters 
of  the  other  exponentials.  Moreover,  the  error  vari¬ 
ance  in  estimating  the  amplitude  parameter  of  each 
exponential  is  decoupled  and  independent  of  all  other 
model  parameters.  It  is  a  function  only  of  the  colored 
noise  spectral  density  at  the  exponential’s  frequency. 
Also,  for  each  exponential  the  least  squares  estimation 
of  its  two  frequency  parameters  u>p  and  vp  is  asymp¬ 
totically  decoupled.  Finally,  it  should  be  emphasized 
that  this  derivation  of  the  large  sample  properties  of 
the  least  squares  estimator  is  independent  of  the  prob¬ 
ability  distribution  function  of  the  observed  field. 


5.  ASYMPTOTIC  EFFICIENCY  OF  THE 
LEAST  SQUARES  ESTIMATOR 

The  Cramer-Rao  bound  (CRB)  provides  a  lower  bound 
on  the  error  variance  in  estimating  the  model  param¬ 
eters  for  any  unbiased  estimator  of  these  parameters. 
Since  the  LS  estimator  of  the  harmonic  component  pa¬ 
rameters  was  shown  to  be  asymptotically  unbiased  we 


Figure  1:  The  asymptotic  error  variance  of  the  LS 
estimate  of  the  amplitude,  phase,  and  spatial  fre¬ 
quency  as  a  function  of  SNR  (dashed  line),  com¬ 
pared  with  the  corresponding  exact  CRB  (solid 
line). 

investigate  in  this  section  its  statistical  efficiency.  As¬ 
suming  the  observed  field  is  Gaussian,  we  investigate 
the  asymptotic  performance  of  the  LS  estimator  of  the 
exponentials’  parameters,  in  comparison  with  the  cot- 
responding  exact  CRB. 

In  the  first  example  we  investigate  the  performance 
as  a  function  of  the  local  signal  to  noise  ratio.  The 
local  SNR  for  the  fcth  exponential  is  defined  as 

SNRfc  =  10  log  -  a*  (23) 

<P(Uk,Vk) 

In  this  example  the  purely-indeterministic  compo¬ 
nent  of  the  field  is  a  NSHP  MA  field  with  support  Siti . 
The  MA  model  parameters  are  5(0, 1)  =  — 0.9e^O  257r), 
6(1, -1)  =  0.le(iOAnl ,  6(1,0)  =  — 0.5e(io-87r),  6(1,1)  = 
0.4e(“!°-27r).  The  driving  noise  of  the  MA  model  is  a 
zero  mean,  circular,  white  Gaussian  noise  field  with  in¬ 
dependent  real  and  imaginary  components,  each  with 
a  unit  variance.  The  harmonic  component  of  the  field 
comprises  a  single  exponential  with  frequency  (oq ,  v\ )  = 
(0.67T,  0.87t).  Its  amplitude  varies  to  provide  the  desired 
range  of  SNR  values.  The  dimensions  of  the  observed 
field  are  20  x  20. 

The  results  of  this  example,  Fig.  1,  indicate  that 
even  for  modest  dimensions  of  the  observed  field,  and 
for  a  wide  range  of  SNR  values,  the  “asymptotic”  error 
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Figure  2:  The  asymptotic  error  variance  of  the 
LS  estimate  of  the  amplitude,  phase,  and  spa¬ 
tial  frequency  as  a  function  of  data  dimensions  for 
SNR  =  -lOdB  (dashed  line),  compared  with  the 
corresponding  exact  CRB  (solid  line). 


variances  of  the  LS  estimates  of  the  amplitude,  phase, 
and  spatial  frequency  are  essentially  identical  to  the 
corresponding  values  of  the  exact  CRB.  These  CRB 
values  are  evaluated  for  the  given  dimensions  of  the 
observed  data  (20  x  20  in  this  case)  making  no  approx¬ 
imations. 

In  the  next  example  we  investigate  the  effect  of  the 
size  of  the  observed  field  on  the  performance  of  the  LS 
estimator  and  on  the  CRB.  The  harmonic  component 
of  the  field  comprises  a  single  exponential  such  that  w  = 
0.27T,  v  =  0.87T.  The  purely-indeterministic  component 
is  the  same  as  in  the  first  example.  To  evaluate  the 
functional  dependence  of  the  LS  estimator  asymptotic 
error  variance,  and  of  the  corresponding  CRB,  on  the 
dimensions  of  the  observed  field  we  set  N  =  M  and 
let  both  N  and  M  assume  values  from  5  to  20.  The 
results  of  evaluating  the  asymptotic  error  variance  of 
the  amplitude,  phase,  and  spatial  frequency  estimates, 
and  the  corresponding  CRB,  as  a  function  of  the  field 
dimensions  are  depicted  in  Figure  2  for  a  local  SNR 
value  of  -lOdB.  The  results  again  indicate  that  the 
asymptotic  error  variance  of  the  LS  estimator  of  each 
of  the  exponentials’  parameters  is  nearly  identical  to 
the  corresponding  exact  CRB,  even  for  modest  data 
dimension  and  relatively  low  SNR  values. 


We  have  investigated  the  problem  of  least  squares  esti¬ 
mation  of  the  parameters  of  complex-valued  exponen¬ 
tials  observed  in  colored  noise.  Making  no  assumptions 
about  the  probability  distribution  of  the  observed  field, 
it  is  shown  that  the  least  squares  estimator  of  the  expo¬ 
nentials’  parameters  is  asymptotically  unbiased,  and  a 
simple  expression  for  its  asymptotic  covariance  matrix 
is  provided.  It  is  further  shown  that,  asymptotically, 
least  squares  estimation  of  the  parameters  of  each  ex¬ 
ponential  is  decoupled  from  the  estimation  of  the  pa¬ 
rameters  of  the  other  exponentials.  Moreover,  the  error 
variance  in  estimating  the  amplitude  parameter  of  each 
exponential  is  decoupled  and  independent  of  all  other 
model  parameters.  It  is  a  function  only  of  the  colored 
noise  spectral  density  at  the  exponential’s  frequency. 
Also,  for  each  exponential  the  least  squares  estimation 
of  its  two  frequency  parameters  u)p  and  vp  is  asymptot¬ 
ically  decoupled. 

Since  the  experimental  results  indicate  that  even 
for  modest  data  dimensions  the  asymptotic  covariance 
matrix  of  the  least  squares  estimator  is  very  close  to 
the  corresponding  exact  Cramer-Rao  lower  bound,  we 
conjecture  that  these  results  hold  when  the  data  di¬ 
mensions  become  larger.  By  definition,  at  the  limit, 
as  data  dimensions  tend  to  infinity  in  both  axes,  the 
exact  Cramer-Rao  bound  converges  to  the  asymptotic 
Cramer-Rao  bound.  We  therefore  further  conjecture 
that  the  asymptotic  efficiency  of  the  least  squares  es¬ 
timator  holds  as  data  dimensions  tend  to  infinity  and 
therefore  the  asymptotic  CRB  matrix  for  the  problem 
of  estimating  the  parameters  of  2-D  exponentials  in  col¬ 
ored  noise  is  given  by  (21)-  (22). 
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Abstract . We  present  methods  for  estimation  of  signal 

parameters  and  apply  these  methods  to  biological  signals. 
The  methods  are  based  on  cross-spectral  phase  which  is 
computed  from  the  phase  of  the  short  time  Fourier  trans¬ 
form.  The  methods  are  applied  to  acoustical  biological  sig¬ 
nals,  including  human  speech  and  dolphin  sonar  clicks. 
Specifically  addressed  are  the  problems  of  crisp  narrow 
band  time  frequency  representations  from  very  small  data 
sets,  accurately  estimating  speech  formants,  blind  recovery 
of  the  group  delay  of  the  transmission  channel  and  equaliza¬ 
tion  of  time-frequency  representations. 

Key  Words:  Time-Frequency/Time-Scale  analysis,  speech, 
formant  recovery,  equalization 

(1.0)  Introduction 

Speech  processing  has  enjoyed  a  surge  in  interest  in  recent 
years  due  to  an  interest  within  the  Government  and  private 
industry  in  the  development  a  machine-based  speech  pro¬ 
cessing  capability.  During  this  time,  speech  signal  process¬ 
ing  research  has  suffered  because  the  speech  research 
community  has  accepted  MEL-warped  cepstral  features  as 
the  standard  signal  processing  front  end  and  has  chosen  to 
focus  on  language  modeling  and  statistical  processing. 
While  these  efforts  are  important,  it  is  the  belief  of  the 
author  that  there  is  still  a  lot  we  do  not  know  about  speech 
and  other  biological  signals,  and  there  is  a  lot  of  useful  infor¬ 
mation  which  may  be  extracted  from  these  signals  by  appro¬ 
priate  signal  processing  techniques.  This  paper  is  one  of  a 
series  in  which  we  have  attempted  to  develop  new  analysis 
techniques  which  are  effective  in  parameterizing  speech  and 
other  non-stationary  signals  and  extracting  information  con¬ 
tained  in  the  signal  itself.  In  most  of  these  efforts,  we  have 
focused  on  cross-spectral  methods  based  on  the  phase  deriv¬ 
atives  of  the  short  time  Fourier  transform  (STFT). 

In  this  paper,  we  use  cross-spectral  phase  based  meth¬ 
ods  to  accurately  estimate  speech  formants,  identify  and 
equalize  the  transmission  channel  and  collapse  the  excitation 
function.  Since  the  data  were  available,  the  methods  are  also 
applied  to  dolphin  clicks  and  the  sonar  returns  from  those 
clicks.  The  methods  have  not  been  applied  to  other  biologi¬ 
cal  signals,  as  yet,  but  because  the  structures  of  many  bio¬ 
logical  signals  are  similar  to  speech,  or  at  least  are  consistent 
with  the  model  used  in  the  analysis  presented  here,  the  meth¬ 


ods  should  apply  with  equal  success.  All  of  the  processes 
presented  here  are  blind,  in  the  sense  that  the  information 
required  to  perform  the  tasks  is  extracted  locally  from  the 
signal  itself. 

(2.0)  The  Signal  Model 

In  modeling  the  speech  signal,  minimal  assumptions  are 
made;  however,  the  model  and  the  processes  based  on  this 
model  are  slightly  different  from  the  model  exploited  in 
standard  frame-based  speech  processing.  At  least  the  inter¬ 
pretation  of  the  signal  is  slightly  different. 

In  normal  frame-based  processing  of  speech,  the  signal 
is  segmented  into  analysis  frames  of  approximately  25  milli¬ 
seconds  duration.  The  power  spectrum  is  computed  from  the 
windowed  signal  frames,  and  the  cepstrum  is  computed  from 
the  power  spectrum.  In  the  cepstrum  computation,  the  spec¬ 
trum  is  effectively  smoothed  by  discarding  all  but  the  first 
few  cepstral  coefficients  (since  truncation  in  time  is  equiva¬ 
lent  to  convolution  of  the  spectrum  by  a  sine  function.) 
Implicit  in  this  model  is  a  signal  which  is  the  convolution  of 
a  nearly  periodic  function  and  the  impulse  response  of  the 
vocal  tract.  The  resulting  spectrum  is  the  product  of  the  har¬ 
monic  structure  resulting  from  the  excitation  function  and 
the  spectral  response  of  the  vocal  tract.  In  order  to  recover 
the  frequency  response  of  the  vocal  tract,  the  power  spec¬ 
trum  is  effectively  smoothed  to  remove  the  harmonic  struc¬ 
ture  of  the  excitation  function.  Since  phase  is  discarded,  only 
the  magnitude  response  is  estimated.  The  accuracy,  preci¬ 
sion,  and  resolution  of  the  method  are  all  limited  by  the  fre¬ 
quency  of  the  excitation  function,  the  bandwidth  of  the 
formnats,  the  length  of  the  analysis  window  of  the  Fourier 
transform,  and  the  smoothing  function. 

We  choose  an  alternate  model  of  voiced  speech  in 
which  the  vocal  tract  is  excited  by  a  single  pulse.  In  this 
model,  the  system  is  excited  as  energy  from  the  pulse  enters 
the  system.  Following  the  excitation,  the  system  is  in  nearly 
steady  state  resonance  damped  by  the  loss  of  energy  in  the 
vocal  tract.  In  this  model,  speech  is  essentially  a  sonar  signal 
in  which  the  formants  or  resonant  frequencies  of  the  vocal 
tract  represent  the  configuration  of  the  vocal  tract,  and  the 
ear’s  function  is  similar,  on  a  small  scale,  to  that  of  a  bat 
sounding  its  environment  to  fly  around  objects  in  the  dark, 
or  to  a  dolphin  pulsing  an  object  to  identify  it  by  its  echoes. 

In  order  to  process  the  signal  under  this  model,  an  anal- 
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ysis  window  which  is  shorter  than  the  excitation  period  must 
be  used.  Since  the  excitation  frequency  F()  is  normally 
between  100  and  300  Hz,  the  analysis  window  must  be  on 
the  order  of  a  few  milliseconds.  Without  super  resolution, 
the  resolution  of  the  power  spectrum  is  no  better  than  few 
hundred  Hertz,  and  the  time  resolution  of  the  excitation 
function  is  no  better  than  approximately  half  the  length  of 
the  analysis  window. 

For  dolphin  sonar,  the  model  is  nearly  identical.  The 
pulse  is  a  short  duration  single  click,  which  propagates 
through  the  water  to  the  target.  When  the  pulse  hits  the  tar¬ 
get,  the  target  is  excited,  followed  by  a  damped  resonance. 

(3.0)  Cross  spectral  methods 

What  we  describe  now  is  a  method  based  on  the  phase  gradi¬ 
ent  which  provides  a  super  resolution  capability  in  both  time 
and  frequency.  Since  differentiation  in  our  phase  gradient 
calculation  is  based  on  products  of  short  time  Fourier  trans¬ 
form  (STFT)  surface,  we  call  the  methods  cross  spectral 
methods.  In  the  STFT,  the  Fourier  transforms  of  product  of 
the  signal  f(t)  and  a  sequence  of  time  translations  of  a 
(short)  analysis  window  w(t)  are  computed.  The  STFT  may 
therefore  be  represented  as 

Fw( co,  T)  =  jf(t  +  T)w(-t)e~imdt .  (1) 

In  equation  (1),  we  have  followed  a  convention,  which  is  not 
quite  standard.  The  order  if  the  surface  variables  was  chosen 
to  represent  frequency  as  column  vectors. 

We  define  the  channelized  instantaneous  frequency 
(CIF)  and  local  group  delay  (LGD)  as 

CIF«n,r)  =  dTarg{FJ(i>,T)}  (2) 

LGD ( co,  T)  =  -3warg{FK,(co,  T)}  (3) 

Both  the  LGD  and  CIF  can  be  computed  as  cross  spec¬ 
tra 

CIF(0J,  T)  =  T  +  lyj'"-  T~ |)}  (4) 


LGD«o,D  .  -  jS  J-)}  (5) 

It  is  easily  verified  [12]  that,  for  fixed  co0,  the  STFT 
Fw{ «>,  T)  is  the  original  signal  filtered  by  a  filter  whose 
impulse  response  is 

wco0  =  w(0«  ■  (6) 

i.e. 


where  represents  convolution.  If  the  filter  frequency 
response 

.  i(co0-co)/ 

%(“)  =  lW%(‘)e  dt  (8) 

is  essentially  contained  in  the  positive  spectrum  i.e. 

| Wo)(®)|  »  max£  >  o|wco0(a>)[  for  ®  <  0 ,  (9) 

then  Fw(co0,  T)  is  the  filtered  ANALYTIC  representation  of 
the  signal,  even  if  the  input  signal  is  real  [14].  The  STFT 
effectively  distributes  the  analytic  signal  in  time  and  fre¬ 
quency. 


(4.0)  Expected  Results 

We  will  now  argue  that  our  speech  model  should  result  in  a 
process  which  is  the  sum  of  “narrowband  processes”.  One 
process  is  a  pulse,  which  is  broadband  in  frequency  and  rela¬ 
tively  localized  in  time.  Following  the  pulse  is  a  resonance 
structure,  in  which  there  are  several  resonances  or  formants, 
which  are  normally  separated  in  frequency.  During  the  exci¬ 
tation,  the  system  is  driven  by  the  excitation  function,  and 
each  “filter”  of  the  Fourier  transform  should  resonate  at  its 
natural  frequency.  During  the  steady  state  resonance,  each 
filter  should  respond  to  the  frequency  of  the  resonance 
which  is  dominant  near  that  filter  frequency.  The  effects  of 
the  channel  group  delay  should  be  reflected  by  a  relative 
delay  in  the  filters  represented  by  the  STFT.  The  problem  in 
measuring  any  of  these  delays  and  responses  is  that  the  they 
are  beyond  the  resolution  of  the  normal  spectrogram.  They 
may,  however  be  estimated  by  the  cross  spectral  methods. 

We  assume  that  at  each  glottal  pulse,  the  filters  respond 
at  their  natural  frequencies.  This  means  that,  at  any  point  on 
the  STFT  surface  where  the  signal  contribution  is  dominated 
by  the  glottal  pulse  excitation,  the  surface  may  be  repre¬ 
sented  locally  as 


Fexc(“’  F) 


i(<o(T- T0)  -  G(co)  -  Gf(co)) 

4(£0,  T)e  b  (10) 


where  4(a),  T)  is  slowly  varying  in  time  and  frequency,  TQ 
is  the  excitation  time, 

g(ro)  =  AG(co)  (11) 


is  the  channel  group  delay,  and 

gF(C0)  =  £Ge(C0)  02) 

is  the  group  delay  of  the  vocal  tract  at  excitation.  In  this 
case,  the  LGD  and  CIF  are 

LGDexc(®-  r)  =  T0  -  T+  *(®)  +  SE((0)  (13) 
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where  TQ  is  the  time  of  occurrence  of  the  pulse.  Note  that 
this  means  that  we  should  expect  the  LGD  to  collapse  the 
pulse  to  a  single  curve 

jT(co)  =  r0-g((o)-gE(<o) ,  (15) 


where  the  right  hand  side  is  not  dependent  on  time. 

Now  consider  a  region  on  the  STFT  surface  near  a  for¬ 
mant  in  steady  state  resonance.  In  this  case,  there  is  no  con¬ 
tribution  from  the  glottal  pulse,  and  all  filters  near  the 
formant  will  be  pulled  to  the  formant  resonant  frequency.  In 
this  case,  the  STFT  surface  may  be  locally  modeled  as 


i((o0(T-T0)-G((o)-GR(a>)) 

FKS(®,T)  =  A(&,T)e 


(16) 


where  the  group  delay  of  the  vocal  tract  at  resonance  gR  is 
the  derivative  of  GR(a>) .  The  LGD  and  CIF  are 

LGDres(co,  T)  =  g(to)  +  gR( to)  (17) 


CIFres(co,T)  =  (o 


0- 


(18) 


(4.1)  Mixed  Partials 

The  mixed  partial  derivatives  give  us  a  convenient  test 
for  excitation  and  steady  state  resonance.  In  the  excitation 
case,  we  may  compute  the  expected  mixed  phase  partial 
derivatives  as 

W,rc(®’  =  1  ■  <19> 

In  the  steady  state  resonance  case,  we  may  compute  the 
expected  mixed  phase  partial  derivatives  as 

Wres«°’  T»  =  ^Wres^'  T)}  =  °  (20) 

With  these  two  relationships,  we  may  build  indicator 
functions  to  test  whether  any  point  on  the  STFT  surface  is 
the  response  to  an  excitation  pulse  or  a  resonance  of  the 
vocal  tract.  The  functions 

/£(w,r)  =  |l-a730/M)(co,7’)|  (21) 

/^co,  T)  =\Ww(co,  T)\  (22) 

have  expected  values  zero  at  excitation  and  resonance, 
respectively.  If  the  condition  is  not  met  at  the  point  (toQ,  TQ) , 
then  the  STFT  response  at  that  point  is  not  driven  by  excita¬ 
tion  (resonance),  and  is  therefore  driven  by  another  process, 
such  as  resonance  (excitation),  or  noise  or  interference.  We 
can  therefore  discard  the  points  on  the  STFT  surface  which 
are  not  indicated  as  the  signal  condition  we  seeking,  and 
therefore  improve  the  processing  gain.  An  important  obser¬ 
vation  is  that  the  indicator  functions  serve  to  effectively  par¬ 
tition  the  STFT  surface  into  three  surfaces.  On  one  surface, 
the  excitation  is  dominant,  and  resonance  is  effectively 
removed.  On  the  second  surface,  resonance  is  dominant,  and 
excitation  is  effectively  removed.  And,  on  the  third  surface, 
artifacts  other  than  resonance  and  excitation  are  dominant. 

To  test  the  indicator  functions,  the  two  indicator  func¬ 
tions  were  computed  and  compared  to  the  remapped  STFT 


surface.  An  example  of  an  indicator  surfaces  is  represented 
by  Figures  5.  As  can  be  seen,  the  surfaces  do  indeed  indicate 
the  excitation  and  resonance  correctly,  and  the  indicators 
tend  to  be  mutually  exclusive.  That  is,  the  excitation  indica¬ 
tor  tends  to  reject  resonance,  and  the  resonance  indicator 
tends  to  reject  the  excitation. 

As  an  additional  check,  the  phase  of  the  mixed  partial 
surface  was  plotted  in  a  neighborhood  of  both  the  excitation 
and  resonances.  The  displays  clearly  show  that  the  mixed 
partials  behave  as  predicted  by  the  model. 


(4.2)  Equalization 

Finally,  we  address  the  problem  of  blind  recovery  of  the 
channel  group  delay.  The  ability  to  blindly  equalize  a  chan¬ 
nel  containing  a  biological  signal  was  first  discovered  while 
analyzing  dolphin  clicks.  Note  that  if  the  channel  group 
delay  were  zero,  then  the  STFT  surface  in  a  neighborhood  of 
excitation  and  steady  state  resonance  respectively  would  be 

i(o(r- rn) 

Fres( co,  T)  =  A« o,  T)e  (23) 


ico0(7-70) 

Fexei (24) 

The  respective  local  group  delays  would  be 

LGDexe^T)  =  TQ-T  (25) 

LGDres «°,  T)  =  0  •  (26) 

We  start  by  estimating  the  group  delay  of  the  vocal  tract. 
To  do  this,  we  have  ground  truth  in  the  form  of  TIMIT  data 
which  was  collected  under  studio  recording  conditions,  in 
which  we  may  assume  that  the  channel  effects  are  insignifi¬ 
cant.  Several  portions  of  voiced  speech  from  the  TIMIT 
database  were  processed  by  remapping  the  surface  to  correct 
the  LGD  and  CIF.  In  each  case,  the  formants  collapsed  to 
constant  frequency  "lines",  and  the  excitation  pulses  col¬ 
lapsed  to  broadband  impulses  in  time,  or  constant  time 
"lines".  This  established  that  the  group  delay  of  the  vocal 
tract  is  zero. 

What  we  therefore  must  do  is  calculate  an  estimated 
group  delay  function,  which  effectively  forces  the  collapsed 
excitation  pulses  on  the  equalized  TF  surface  to  impulses  in 
time.  Recall  that  the  clean  TIMIT  data  satisfied  this  condi¬ 
tion.  For  small  group  delays,  where  the  channel  group  delay 
is  less  than  the  analysis  window,  we  can  effectively  equalize 
by  identifying,  with  the  aid  of  the  indicator  function,  a  T0 
near  the  center  of  the  excitation  pulse.  If  we  consider  the  sur¬ 
face 


71  =  e 


-'arg^l®.  T) 


FW«»,T), 


(27) 


we  see  that  conditions  (25,26)  are  satisfied.  That  is,  correct¬ 
ing  the  spectral  phase  by  the  observed  spectral  phase  of  a 
glottal  pulse  effectively  removes  the  group  delay,  at  least  for 
group  delays  which  are  relatively  well  behaved.  The  pro¬ 
cess  was  used  to  “straighten  out”  NTIMIT  (Figures  8  and  9) 
and  dolphin  backscatter  data.  The  process  has  not  been 
tested  on  severe  channels,  since  no  data  were  available. 

For  larger  group  delays,  it  is  necessary  to  piece  together 
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several  spectra  to  reconstruct  the  group  delay.  If  the  analysis 
window  is  less  than  the  span  of  the  channel  group  delay,  the 
estimated  group  delay  will  have  an  ambiguity.  A  delay 
greater  than  the  length  of  the  analysis  window  will  result  in 
the  group  delay  aliasing  or  being  folded  modulo  the  length 
of  the  analysis  window.  In  addition,  the  LGD  function  is 
only  valid  where  there  is  significant  spectral  energy  near  the 
formants.  In  the  nulls  between  the  formants,  the  energy  is 
low,  resulting  in  erroneous  estimation.  Both  of  these  ambi¬ 
guities  can  be  resolved  by  combining  estimates  from  several 
spectra. 

(5.0)  Methods  and  Conclusions 

Following  the  discussion  in  the  paper,  samples  of  data  from 
the  TIMIT  and  NTIMIT  database  were  selected.  The  TIMIT 
data  was  recorded  with  a  close  talking  microphone,  and  the 
NTIMIT  data  is  the  TIMIT  data  subjected  to  the  NYNEX 
telephone  channel.  In  addition,  random  samples  of  dolphin 
back  scatter  data  were  selected  from  data  provided  by  the 
Department  of  the  Navy.  The  STFT,  LGD,  CIF,  and  mixed 
partial  surfaces  were  computed,  with  a  prolate  spheroidal 
window  of  the  same  approximate  length  as  the  expected 
excitation  pulse.  The  surfaces  were  remapped  as 

Fremap(CIF(<0’  T)’T+  LGD((0)  F))  =  Fw(®>  ^  (28) 

The  excitation  pulses  of  the  remapped  NTIMIT  surfaces 
collapsed  to  curves.  The  dolphin  excitation  pulses  were 
slightly  curved,  and  the  TIMIT  pulses  were  nearly  straight. 
Equalization  of  the  pulses  resulted  in  nearly  straight  lines 
after  remapping.  In  each  case,  the  formants  collapsed  to 
nearly  constant  frequency  lines.  Typical  examples  of  the 
processed  data  are  represented  in  the  figures. 

The  second  partial  derivatives  were  computed  for 
selected  samples  of  TIMIT  data,  and  the  data  depicted  in  the 
figures  verify  the  mixed  partial  relationships  described 
above. 


Figure  la:  Conventional  spectrogram  of  dolphin  click  and 
sonar  return. 
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ABSTRACT 

We  address  the  problem  of  detection  and  estimation  of  sinusoids 
embedded  in  white  Gaussian  noise.  We  follow  a  Bayesian  ap¬ 
proach  and  adopt  robust  default  priors,  Expected  Posterior  priors. 
In  order  to  compute  the  associated  Bayes  factor  required  for  model 
selection  we  resort  to  Monte  Carlo  Markov  chain  algorithms,  and 
illustrate  performance  on  an  example. 

1  Introduction 

Model  selection  is  a  fundamental  data  analysis  task.  It  has  many 
applications  in  various  fields  of  science  and  engineering,  includ¬ 
ing  the  canonical  problem  of  detection  and  estimation  of  sinusoids 
embedded  in  noise.  Over  the  past  two  decades,  many  of  the  classi¬ 
cal  model  selection  problems  have  been  addressed  using  informa¬ 
tion  criteria  such  as  AIC  [3],  BIC  [18]  or  Rissanen’s  MDL  [15]. 
The  widespread  use  of  these  criteria  is  mainly  due  to  their  intrin¬ 
sic  simplicity.  However  they  rely  on  asymptotic  expansions,  when 
the  number  of  data  is  large,  and  quantifying  the  effect  of  these 
approximations  when  small  data  set  are  analyzed  seems  to  be  dif¬ 
ficult.  Bayesian  statistics  provides  a  simple  and  sound  framework 
to  the  task  of  model  selection,  see  [5]  for  a  recent  review.  Un¬ 
fortunately,  within  this  framework,  model  selection  appears  more 
difficult  from  a  practical  point  of  view,  mainly  for  two  reasons. 
Firstly,  Bayesian  inference  requires  the  choice  of  a  prior  distribu¬ 
tion  for  the  unknown  parameters,  which  might  at  first  sight  appear 
to  be  a  difficult  exercise,  especially  in  situations  where  no  prior 
information  is  available.  Many  efforts  have  been  devoted  to  the 
development  of  a  methodology  that  provides  a  framework  for  the 
automatic  determination  of  such  uninformative  prior,  see  [12]  for 
a  review.  However,  most  of  these  priors  are  typically  improper1 , 
which  does  not  cause  any  problem  when  parameter  estimation  is 
concerned,  but  can  lead  to  indeterminate  answers  when  model  se¬ 
lection  is  investigated,  as  illustrated  in  Section  3.  Secondly  it 
is  worth  noting  that  the  quantities  required  to  perform  Bayesian 
model  selection  do  not  usually  admit  any  closed-form  expression 
and  that  analytical  approximations,  such  as  BIC  [18],  or  numerical 
evaluations  are  then  required. 

In  this  paper  we  propose  to  address  the  problem  of  robust  and 
consistent  detection  of  the  number  of  sinusoids  embedded  in  noise 
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1 A  distribution  is  improper  when  its  sum  is  not  finite.  As  a  consequence 
such  a  distribution  cannot  be  normalized. 


in  a  Bayesian  framework  using  uninformative  improper  priors.  A 
review  of  the  literature  on  the  subject  can  be  found  in  [1].  Our  ap¬ 
proach  relies  on  expected  posterior  (EP)  priors  that  have  been  re¬ 
cently  introduced  in  [13]  and  numerical  techniques,  Monte  Carlo 
Markov  chains  (MCMC),  that  have  revolutionized  applied  statis¬ 
tics  over  the  past  ten  years.  The  problem  considered  is  of  great 
interest  in  many  fields,  as  suggested  by  the  vast  literature  dedi¬ 
cated  to  the  problem  (see  for  example  [7],  [8],  [14]  and  references 
therein),  but  it  should  be  pointed  out  that  the  methodology  can  be 
adapted  to  other  scenarios. 

The  paper  is  organized  as  follows.  In  Section  2  the  signal 
model  is  given.  In  Section  3,  we  specify  robust  and  uninforma¬ 
tive  prior  distributions  for  our  problem.  In  Section  4  we  develop 
MCMC  algorithms  to  compute  the  quantities  required  to  perform 
Bayesian  model  selection.  The  performance  of  our  procedure  is 
illustrated  by  computer  simulations  in  Section  5. 

2  Model  of  the  data 

Let  y  =  (j/i ,  3/2,  •  •  • ,  yr)1  be  an  observed  vector  of  T  real  data 
samples.  The  elements  of  y  may  be  represented  by  different  mod¬ 
els  Mk  corresponding  either  to  samples  of  noise  only  or  to  the 
superimposition  of  k  sinusoids  corrupted  by  noise: 

Mo  :  yt  =  nt, o 

Mk  ■  yt  =  Ey=i  (oc,-, *  cos  +  a3jik  sin  [wj.fef])  +  nt,k , 

where  u)j1,k  A  w/2,k  for  ji  ^  jz  and  a,j,k,  Wj.k  are  respectively 
the  amplitude  and  the  radial  frequency  of  the  j*  sinusoid  for  the 
model  with  k  sinusoids.  The  noise  sequence  n*,  =  (ni, ,  nr, it) 
is  assumed  zero-mean  white  Gaussian  with  covariance  matrix  cr| It. 
In  a  vector-matrix  form,  we  have 


y  =  D  (wfc)a*,  +  nk, 


where  [afc]2i_M  =  aCilc,  [a*]2{1  =  as;  >k  and  H-j  -  Ui,k  for 
i  =  1, . . . ,  k.  The  matrix  D  (w*,)  is  defined  as  [D  (wfc)]t  2j_1  = 
cos  [ojj,kt]  and  [D  (w/c)]t,2j  =  sin  [coj./tt]  for  t  =  1, . . . ,  T,  j  = 
1, . . . ,  k.  This  allows  us  to  write  the  likelihood  of  the  observations 


p{y\ak,crl,uk)  = 


(2™i0 


—z  exp 


(- 


l|y-D(u,^atl|n 


We  assume  here  that  the  number  k  of  sinusoids  and  their  parame¬ 
ters  0k  —  (a it, erf,  u>a.)T  are  unknown.  Given  the  data  set  y,  our 
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objective  is  to  estimate  k  and  Ok-  We  will  assume  that  the  maxi¬ 
mum  possible  number  of  sinusoids  is  fcmax  —  L(T  -  1)  /2J ,  see 
[1]  for  motivations. 


3  Model  selection  and  EP  priors 

We  follow  a  Bayesian  approach  where  the  unknowns  k  and  Ok 
are  regarded  as  being  a  priori  distributed  according  to  appropriate 
prior  distributions.  These  priors  reflect  our  degree  of  belief  of  the 
relevant  values  of  the  parameters.  In  this  section  we  first  recall  the 
key  role  played  by  Bayes  factors  for  model  selection  and  point  out 
the  problem  associated  with  the  use  of  improper  priors.  Then  EP 
priors  are  introduced  to  solve  our  problem.  This  section  ends  with 
the  detection  objectives  for  the  problem  investigated,  formulated 
in  a  Bayesian  framework. 

3.1  Bayesian  model  selection  and  EP  priors 


Assume  as  in  our  case  that  fcmax  models  Mk  are  under  consid¬ 
eration  for  a  data  set  y.  Model  Mk  corresponds  to  assuming  a 
probability  density  p  (y|  0k)  for  the  observations,  the  likelihood, 
which  depends  on  a  parameter  0k.  The  Bayesian  approach  re¬ 
quires  the  specification  of  prior  densities  pk  (Ok)  —  p  (Ok  \  k)  and 
possibly  the  specification  of  model  prior  probabilities  p  (k).  Then 
the  key  quantities  on  which  Bayes  model  selection  relies  are  the 
Bayes  factors 


R.  .  =  p(i\y)/p(i) 

3  p(j|y)/p(j) ’ 

which  by  introduction  of  the  predictive  densities 


mi  (y)  =  S&ip{y\0i)pi{0i)d0l, 
can  be  reformulated  as 


BH  =  m  (y)  /rrij  (y) , 

which  shows  that  specification  of  the  model  prior  probability  is  not 
necessary.  The  Bayes  factor  is  often  interpreted  as  the  “odds  pro¬ 
vided  by  the  data  for  Mi  versus  Mj.”  Motivations  for  the  choice 
of  Bayes  factors  for  model  selection  can  be  found  in  [5],  [11],  The 
cornerstone  of  this  approach  seems  at  first  sight  to  be  the  choice  of 
a  prior,  especially  in  situations  when  no  prior  knowledge  is  avail¬ 
able.  The  need  for  automatic  or  default  approaches  for  the  choice 
of  uninformative  prior  has  been  recognized  for  a  long  time  [10]. 
In  estimation  problems,  the  use  of  vague  or  uninformative  prior 
distributions,  including  sometimes  improper  prior  distributions,  is 
typically  a  satisfactory  solution  [12],  When  performing  model  se¬ 
lection,  however,  one  has  to  be  much  more  careful  as  default  priors 
are  typically  improper,  and,  thus,  depend  on  arbitrary  multiplica¬ 
tive  constants,  i.e.  pf  (0t)  =  a  f,N  (0,)  for  some  function  f,N 
(we  use  the  superscript  N  to  indicate  the  use  of  a  uninformative 
or  default  prior  for  the  model  parameters).  Hence,  the  resultant 
Bayes  factor 

rjv  _  g 

ij  ci  S@j  pjyie^pf  (SiJdSj  ’ 

is  indeterminate,  and  cannot  be  used  for  model  selection.  Note 
that  the  use  of  “vague  proper  priors”  usually  give  wrong  answers 
in  Bayesian  model  selection,  as  it  has  long  been  recognized  since 
[10].  A  number  of  proposals  to  overcome  this  problem  have  been 
made.  Approaches  using  conventional  priors  have  been  studied 
in  [10],  [20],  In  the  case  of  nested  models,  proper  hierarchical 
robust  prior  models  have  been  successfully  developed  for  vari¬ 
ous  applications  [1],  [16],  Other  approaches  include  the  Intrin¬ 
sic  Bayes  Factor  (IBF)  [4],  the  Fractional  Bayes  Factor  (FBF)  and 


the  method  suggested  in  [17],  among  others.  Most  of  these  later 
methods  deal  with  the  problem  by  rescaling  the  Bayes  factor  by  a 
correction  factor  in  such  a  way  that  any  undesirable  constant  can¬ 
cels.  EP  priors  have  been  recently  proposed  in  [13]  and  belong  to 
this  family  of  approach.  This  approach  relies  on  the  utilization  of 
the  device  of  “imaginary  training  sample”,  a  well  known  approach 
[9],  More  precisely  imagine  that  some  extra  data  y*  are  available, 
and  that  these  data  are  used  to  update  a  possibly  non  proper  prior 
Pi  (#•)  =  af  (0>)  using  Bayes  rule, 


/'(Oily*) 


p(y’|e;)ci/(0i) 

/®1.p(y*|eiW(ei)dei- 


This  posterior  distribution,  when  it  is  defined,  does  not  depend  on 
a  anymore  and  can  be  used  as  a  new  prior  distribution.  However 
as  y*  is  not  actually  observed  these  posteriors  are  not  available. 
The  key  idea  of  EP  prior  is  to  consider  a  suitable  predictive  mea¬ 
sure  on  the  imaginary  training  sample  space  J>*  and  integrate  out 
these  artificial  data,  leading  to 


P*  W)  =  /3;.P7V(0i|y*)m*(dy*). 

The  measure  m*  can  intuitively  be  viewed  as  arising  from  be¬ 
liefs  of  how  a  real  training  set  would  behave.  Several  choices  for 
m *  are  possible,  but  we  note  that  y*  and  m*  should  be  such  that 
PN  (0i|  y*)  exists.  In  many  cases  for  example  it  is  required  that 
the  size  of  y*  is  more  than  a  given  minimal  size  so  that  the  pos¬ 
terior  pN  (Qi  \  y*)  exists.  A  training  set  with  such  size  is  said  to 
be  of  minimal  size.  We  now  detail  two  possibilities  retained  in  this 
paper. 

An  attractive  choice  for  y*  and  m *  consists  of  selecting  a  base 
model  M,  and  defining  m*  (y*)  =  /@p(y*|0)p^  ( 0)d0 .  In¬ 
tuitively  the  base  model  should  be  at  least  as  simple  as  the  other 
models,  so  that  little  constraint  on  y*  is  imposed.  In  the  case  of 
nested  models,  the  EP  priors  resulting  from  this  choice  correspond 
to  the  intrinsic  priors  for  the  Arithmetic  IBF  [13].  Alternatively 
an  empirical  version  of  m*  can  also  be  considered,  where  train¬ 
ing  samples  are  obtained  by  resampling  from  the  observations  y. 
Once  proper  y*  and  m*  have  been  selected,  the  Bayes  factor  of 
Mi  against  Mj  resulting  from  the  EP  priors  can  be  expressed  as 

m„+  (y)  A 

Bh  M  =  ^t77).  with  mPt  (y)  =  /©,.  p(y|  Oi)Pt  (0;) dOi, 

pj 


Therefore,  the  resulting  Bayes  factor  does  not  depend  on  arbitrary 
multiplicative  constants  leading  to  consistent  Bayesian  model  se¬ 
lection.  We  list  here  some  of  the  other  interesting  properties  of  EP 
priors: 


•  The  resulting  Bayesian  inference  allows  for  multiple  com¬ 
parisons.  For  instance  Bij  will  be  equal  to  Bik  times  Bkj,  a 
property  not  shared  by  all  default  model  selection  methods. 

•  In  many  cases,  it  is  possible  to  find  m*  such  that,  for  a 
sample  of  minimal  size,  there  is  predictive  matching  for  the 
comparisons  of  model  Mi  against  Mj,i.e.,  the  Bayes  fac¬ 
tor  Bij  is  equal  to  1. 

•  In  certain  situations,  the  approach  is  essentially  equivalent 
to  previous  successful  approaches.  For  instance  in  the  case 
of  nested  models,  when  Mi  is  nested  in  every  other  model, 
choosing  m *  to  be  the  marginal  of  y*  under  Mi  is  asymp¬ 
totically  equivalent  to  the  arithmetic  IBF  [4], 

Other  nice  properties  of  EP  priors  can  be  found  in  [13].  We  now 
derive  an  EP  prior  for  the  problem  of  detection  of  sinusoids  in 
noise. 
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3.2  EP  prior  for  robust  spectral  analysis 

For  9k,  we  assume  the  following  uninformative  prior  distribution 
for  all  the  dimensions 

pN  (ak,crl,Uk\k)  oc  4j-I n(k,Uk) , 

which  is  clearly  improper.  The  set  Cl  is  defined  as  (Ji=o’‘  W  x 
where  flk  =  {wt  6  (0,7r)fc  ;i  /  j  implies  [w*];  #  [wit],-  }  for 

k  >  0  and  fio  =  0.  We  do  not  want  to  favor  any  subset  of  the  ar¬ 
tificial  data  set  y*  and  thus  introduce  the  set  y%  of  all  subvectors 
of  length  mk  (mk  will  be  determined  later  on)  made  from  y*.  In 
order  to  define  all  the  quantities  related  to  each  elements  of  yk  we 
introduce  an  arbitrary  labelling  on  all  the  combinations  of  length 
mk  in  the  set  {1, . . . ,  mfcmax}.  The  vector  yfi(  is  then  the  vector 
of  length  k  made  from  y*  for  which  the  retained  indices  corre¬ 
spond  to  combination  number  l .  From  now  on  l  is  assumed  to  be 
random,  distributed  according  to  a  uniform  distribution.  We  now 
first  compute  the  expression  of  p  (a^crf,  oik  \  k,  ylit)  obtained 
from  Bayes’  rule.  After  some  algebra  one  obtains  for  A:  >  0 


p(<Tk\k,Uk,y*k,i)  = 


r((mfc-2fc)/2)(CT2) 

HK_r "i.lllw 


r~yfcTipib.iyt.j 


2 \(mfc“2fc)/2+l 


p(*k\k,0l,Uk,yl,i)  =  — 

p(wfc|yfc,t)  =  i/^- 

when  y^j  (£  span  {D  (w/t)}  and  mk  —2k>  0,  and  where 

M n1  =  D (u*)  Dfc  j  (wfc) ,  mh  =  Mlj-DZ  (uk)  yf,( 
Pij  =  I™,  -  Dit,  WM^Djfan), 

with  Dii(  (uk)  the  mkm^  x  2 k  matrix  extracted  from 
[D  (ui)]lirll(  1;2fc  corresponding  to  the  Ith  combination  of  in¬ 
dices.  When  k  =  0  then 


p(ffo|0,yo.i)  -  ' 

Two  possibilities  for  the  definition  of  m*  can  be  proposed: 

•  Choose  Mo  as  base  model,  and  build  the  measure 

.  /  »\  r  /  *l  2'i  da l  r(mfcma*/2) 

m  (y  )  =  J«+p(y  • 

One  observes  here  that  m*  (y*)  is  spread  out,  reflecting  the 
fact  that  little  constraints  are  imposed  by  the  base  model. 
We  can  now  give  the  definition  of  the  EP  prior 

P*  (a*,«Tfc,wfc|  k)  = 

and  we  observe  that  mk  =  2k  is  the  minimum  size  required 
on  yy  for  p  (a*.,  cxf ,  W/b|  k,  ykl)  to  exist.  Note  that  this 
prior  is  not  proper,  but  does  not  introduce  any  arbitrary  con¬ 
stant  in  the  Bayes  factors. 

•  Build  m*  (y‘)  from  the  observations,  i.e. 

m*  fr*) =  wtz  !<y«>  (y*)  ■ 

We  will  here  only  explore  the  first  possibility. 


^ySJ.y5.Amo/ac,1,r-y5>S..l 
2  I  -p  ^ 


3.3  Model  order  prior 

The  model  order  prior  distribution  does  not  affect  the  Bayes  fac¬ 
tors,  and  thus  the  model  selection  rule,  but  can  have  an  influence  on 
the  mixing  properties  of  the  algorithm  developed  later.  We  chose 
here  a  truncated  Poisson  distribution,  i.e.  p{k)  oc  £fI{o,...,fcma*} 
but  any  other  choice  such  as  a  uniform  distribution  would  have 
been  possible. 

3.4  Integration  of  the  nuisance  parameters 

The  proposed  Bayesian  model  allows  for  the  integration  of  the  so- 
called  nuisance  parameters,  a*  and  erf,  and  subsequently  to  obtain 
an  expression  forp  ( k,  Wk,l,  y*  I  y)  up  to  a  normalizing  constant. 
According  to  Bayes’  theorem 

p(fc,afe,fff,wfcli)y*|y)  <xp(y\k,ak,at,uk) 
xp(ak,(rl,uk\k,yiii)m(y*)p  ( k )  p  ( l ) , 

with 

=  DT  («*•)  D  (u»fc)  +  Mf  J1 
mk,i  =  Mi t,z  j^DT  (wit)  y  +  . 

Consequently, 


p(Mfc,0-f,urfc,I,y*|y)  « 

exp  3^(yTy-mI,iMfc,imM+yMyiS,i)] 

.Id*  *  \(mk-2k)/2 

yVp*,.g*dj  p(*)p(Om(y*)^n(fc,wO- 

The  integration  of  a*  (similar  to  a  normal  distribution)  and  then  of 
erf  (similar  to  an  inverse  gamma  distribution)  yields  for  k  >  1 


p(*JwfclI,y*|y)oc|M;,,r 

/v*Tp*  v»  \{mh-2k)/2 

x  f  j 


r((rT7fc  — 2fe)/2) 


p(k)p(l)m(y“)  ^In(k,Uk) 


( yTy~ml,iMk  imfc.i+yJ!iyi,i  \ 

H  2  ) 


and  a  similar  expression  for  k  =  0,  [2],  The  overall  param¬ 
eter  space  ©  can  be  written  as  a  countable  union  of  subspaces 
0  =  ujj””  {fc}  x  @k  where  ©0  =  R+  x  y*  for  k  =  0,  0/t  = 
(R2)*  x  R+  x  nk  x  Ck  x  J*,  for  k  €  {1, . . . ,  femax}  and  where 
Ck  =  {l,  ...,Cm£mo)t}. 


3.5  Estimation  Objectives  and  Bayesian  computation 


The  objective  is  to  compute  the  Bayes  factors,  and  more  precisely 
the  quantities  mp*  (y).  Then  model  selection  can  be  performed 
from  these  quantities,  see  Section  5  for  example. 


4  Bayesian  computation 

In  order  to  evaluate  the  Bayes  factors,  we  are  interested  in  comput¬ 
ing  the  quantities  p(k\y)  for  which  no  closed-form  expression 
exists.  One  has  to  resort  to  numerical  methods.  MCMC  tech¬ 
nics  are  very  powerful  methods  that  allow  for  these  quantities  to 
be  computed  in  an  efficient  manner.  Roughly  speaking  MCMC 
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consist  of  running  an  ergodic  Markov  chain  whose  invariant  dis¬ 
tribution  is  the  distribution  of  interest,  here  the  posterior  distribu¬ 
tion  p  (k,  afc,  ak,  l,  y*  |  y).  Under  weak  conditions  the  sam¬ 
ple  path  of  the  generated  Markov  can  be  used  to  compute  quantities 
related  to  the  posterior  distribution.  In  our  case,  for  example,  we 
are  interested  in  the  marginal  posterior  distribution  p(k  =  j\  y ), 
which  can  be  evaluated  from  the  sample  path  of  the  Markov  chain 
using  the  formula p;  (k=j\y)  =  £*=io  j ,  af¬ 

ter  convergence  towards  the  invariant  distribution.  The  algorithm 
we  develop  here  is  an  adaptation  of  the  algorithm  presented  in 
[1],  that  takes  here  into  account  the  specific  features  introduced  by 
the  use  of  EP  priors.  For  a  complete  introduction  to  MCMC  one 
should  however  refer  for  example  to  [16]  and  references  therein. 

4.1  The  main  algorithm 

In  order  to  build  the  Markov  chain  for  our  problem  we  introduce 
the  following  updates  of  the  parameters:  (a)  birth  of  a  new  sinu¬ 
soid,  (b)  death  of  an  existing  sinusoid,  (c)  update  of  the  frequen¬ 
cies  for  all  the  sinusoids  one-at-a-time,  when  k  ±  0  (d)  Update  the 
training  samples  y*.  The  birth  and  death  moves  perform  dimen¬ 
sion  changes  respectively  from  k  to  k  +  1  and  k  to  k  -  1.  These 
moves  are  defined  by  heuristic  considerations,  the  only  condition 
to  be  fulfilled  being  to  maintain  the  correct  invariant  distribution. 
A  particular  choice  will  only  have  influence  on  the  convergence 
rate  of  the  algorithm.  Other  moves  may  be  proposed,  but  we  have 
found  that  the  ones  suggested  here  lead  to  satisfactory  results.  The 
resulting  transition  kernel  of  the  simulated  Markov  chain  is  then 
a  mixture  of  the  different  transition  kernels  associated  with  the 
moves  described  above.  This  means  that  at  each  iteration  one  of 
the  candidate  moves:  birth,  death  or  update  is  randomly  chosen. 
The  probabilities  for  choosing  these  moves  are  bk ,  dk  and  uk  re¬ 
spectively,  such  that  bk  +  dk  +  uk  =  1  for  all  0  <  Jfc  <  km&x 
and  the  update  moves  are  performed  at  each  iteration.  The  move 
is  performed  if  the  algorithm  accepts  it.  For  k  =  0  the  death  move 
is  impossible,  so  that  do  =  0.  For  k  =  /cma*  the  birth  move  is  im¬ 
possible  and  thus  bkmnx  —  0.  Except  in  the  cases  described  above, 
we  take  the  following  probabilities: 

bk±  c  min  { 1 ,  2^  }  ,  <4+i  =  c  min  { 1 ,  }  , 

where  p  ( k )  is  the  prior  probability  of  model  Mk  and  c  is  a  param¬ 
eter  which  tunes  the  proportion  of  dimension/update  move.  The 
algorithm  can  be  summarized  as  follows: 


reversible  Jump 


.  algorithm 


1.  Initialization:  set  (k^ ,  e  ©. 

2.  Iteration  i: 


Choose  one  of  the  following  move 


•  With  probability  6.  (i)  a  “birth”  move  (See  Sub¬ 
section  4.5). 

•  With  probability  dfc(i)  a  “death”  move  (See  Sub¬ 
section  4.5). 

•  With  probability  uk(i)  update  the  frequencies  uk 
(See  Subsection  4.2) 


Update  l,  y*  (See  Subsection  4.4). 


We  describe  more  precisely  these  different  moves  below.  In  what 
follows,  in  order  to  simplify  notation,  we  drop  the  superscript  AO 
from  all  variables  at  iteration  i. 


4.2  Updating  the  frequencies 

We  use  the  same  technique  as  described  in  [1]  with  the  target  dis¬ 
tribution  here  proportional  to  (2)  in  order  to  take  into  account  the 
EP  prior. 

4.3  Updating  the  nuisance  parameters 

In  this  subsection  we  show  how  it  is  possible  to  sample  the  nui¬ 
sance  parameters.  We  point  out  that  if  one  is  not  interested  in 
estimating  these  nuisance  parameters  then  this  simulation  step  is 
not  required.  We  obtain  by  straightforward  calculations: 

°l\ {y,K^k,yl,i)  ~ 

■j-g  ^  r+mt,  —2k  V Ty  ~  ml, I  +y tV fe.l 

I  {y,k,al,uk,yltl)  ~  M  (mk}l,  . 

4.4  Update  the  y* 


Update  /,  y‘ 


•  Draw 


°l\  (y  ,k,u»k) 


XQ  yTy-yTD(^fe)fDT(u't)D(an,)]DT(a)<,)y^ 

3*|  (y,k,<rl,u)k)  ~ 

Af  ([DT  (wfc)D(wfc)]  DT  (u>k)  y,  uk  [Dt  (u>k)  D  (w*)] -1 )  ■ 

•  Draw  l  lA#yk . 

•  DrawyJ,,  ~W(DK>;(w*)5fc,5i|lm,). 

•  Random  walk:  (y£,,)c  ~  U  ((y£,,)c , 

•  Accept  update  with  probability  min  {1,  ry. },  see  [2].  a 
where  A  is  user  defined. 

4.5  Reversible  jumps:  birth/death  moves 

Suppose  that  the  current  state  of  the  Markov  chain  is  in  {k}  x  &k, 
then 


Birth  move 

•  Propose  a  new  frequency  at  random  on  (0, 7r):  w  ~ 

^(0,7T)- 

•  Propose  new  values  for  y*k+lti  with  the  strategy  of 
Subsection  4.4. 

•  Evaluate  atirtfc,  see  [2], 

•  Accept  ( k  +  1,  Wit+i),  with  probability  abirth,k  else  re- 

main  at  (k,uk).  _ a 

Assume  that  the  current  state  of  the  Markov  chain  is  in  {ifc  +  1}  x 
©*+i,  then 

Death  move 


•  Choose  a  sinusoid  at  random  among  the  k  + 1  existing 

sinusoids:  j  ~  . *.+1}. 

•  Propose  new  values  for  y] t_,  ,  with  the  strategy  of 
Subsection  44.4. 

•  Evaluate  adeath,  see  [2]. 

•  Accept  (k, at*)  with  probability  adeath,k ,  else  remain 

(k  +  l,u>fc+i). _  B 


408 


5  Simulations 

A  set  of  synthetic  data  was  generated.  The  data  set  corresponds  to 
a  sinusoid  series  with  four  frequencies  and  same  amplitudes,  con¬ 
taminated  with  Gaussian  noise.  The  SNR  was  OdB.  The  MCMC 
algorithm  described  above  was  run  for  50000  iterations,  of  which, 
the  first  5000  were  burnt  out.  The  maximum  number  of  frequen¬ 
cies  was  set  at  fcmax  =  8,  and  the  mean  parameter  for  the  trun¬ 
cated  Poisson  was  set  at  A  =  4.  The  renormalized  quantities 
p{k\y)  /p  (k)  that  allow  us  to  compute  the  Bayes  factors  are  shown 
in  Tab.  1.  In  Fig.  1  we  present  for  each  iteration  the  estimate  of 
p  (fc|  y)  /p  ( k )  for  A;  =  0, . . . ,  8.  Comparison  with  the  results 
obtained  in  [1]  with  slightly  informative  priors  are  currently  inves¬ 
tigated. 


0 

1 

2 

3 

4 

5  > 

3.91% 

6.54% 

13.92% 

29.34% 

35.97% 

10.32% 

Table  1:  Renormalized  predictives  p  (k/ y)  /p  ( k ). 


Figure  1:  For  the  iterations  i=l,... ,50000,  the  current  estimates  of 
p{k/y) /p  (k) 
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ABSTRACT 

The  gliding  tone  problem  is  the  response  of  a  resonant 
circuit  when  the  driving  force  is  a  gliding  tone,  that  is 
a  chirp.  The  problem  was  first  considered  by  Barber 
and  Ursell,  and  independently  by  Hok,  both  papers  ap¬ 
pearing  in  1948.  Barber  and  Ursell,  and  Hok,  and  sub¬ 
sequent  investigators  considered  approximate  solutions 
and  attempted  to  qualitatively  understand  the  behav¬ 
ior  of  the  response.  An  exact  solution  has  never  been 
obtained.  We  have  found  the  exact  Wigner  distribution 
of  the  solution.  This  allows  one  to  study  the  nature  of 
the  solution  without  any  approximations.  We  have  ob¬ 
tained  the  exact  solution  by  way  of  a  new  method  to 
study  dynamical  systems. 

1.  INTRODUCTION 

In  1948  Barber  and  Ursell  [1]  and  independently  Hok  [4] 
considered  the  problem  of  the  response  of  a  harmonic 
oscillator  to  a  “gliding  tone”.1  Specifically  the  issue  is 
the  behavior  of  the  solution  to  a  resonant  circuit  [1,  4,  5] 

+  2  li—~+u)lx(t)  =  f{t)  (1) 


d2x(t) 
dt 2 


with 


f(t)  =  e^‘2/2  (2) 

Subsequent  to  Barber  and  Ursell,  and  Hok,  many 
investigators  have  considered  this  problem  in  a  variety 
of  contexts  and  have  tried  to  qualitatively  understand 
the  solution  and  also  obtain  approximate  solutions.  An 
exact  solution  to  this  problem  has  not  been  achieved. 
We,  also,  have  not  been  able  to  obtain  an  exact  explicit 
solution;  but  we  have  been  able  to  obtain  the  exact 
solution  to  the  Wigner  distribution  of  x(t)\ 

Galleani’s  permanent  address:  Dipartimento  di  Elettronica, 
Politecnico  di  Torino,  C.so  Duca  degli  Abruzzi  24,  10129  Torino, 
Italy. 

Work  supported  by  the  Office  of  Naval  Research,  the  NASA 
JOVE  ,  and  the  NSA  HBCU/MI  programs. 

1The  phrase  “gliding  tone”  was  used  by  Barber  and  Ursell. 


We  have  been  able  to  obtain  the  exact  solution  by 
using  a  new  method  that  we  have  developed  to  study 
dynamical  systems  [3]. 

In  the  next  section  we  give  the  explicit  solution  to 
the  gliding  tone  problem  and  subsequently  we  give  a 
few  numerical  examples.  In  the  appendix  we  explain 
how  we  have  obtained  the  exact  solution. 


2.  THE  EXACT  WIGNER  DISTRIBUTION 


We  define  the  Wigner  distribution  by  [2] 

W(t,  u)  =  ~  J  x* (t  -  \t)  x(t  +  |r)  e_'7TU’  dr 

and  the  step  function,  u(t),  by 


(■3) 


u{t)  = 


1  t  >  0 
0  otherwise 


(4) 


We  now  give  the  exact  solution  of  the  Wigner  dis¬ 
tribution  of  x{t )  which  satisfies  Eq.  (20).  The  proof  is 
outlined  in  the  appendix.  Explicitly, 


W(t,u) 


2  u(t) 
\0\  z2  -  zi 


1  /e-2*ir  _  e-2f2r 

Zi  -  Z\  V  Z2~Z\ 

1  /e-2*2T  _  e-2z2r 

Zl-  Z2\  Zi~  Z2 


with 


e-2zir  _  e-2z2r 

Z2  -  Zi 

e~2ziT  _  e-2z2T 

z2  -  Zl 


T  =  t  —  lj/(3 

(6) 

Zi  = 

=  -jw  +  n  - 

\/v2-“o 

(7) 

Zi  = 

=  ju)  +  H  -  yj 

V  -  W l 

(8) 

Z2  = 

=  -ju  +  p  + 

o 

(9) 

z2  = 

=  ju  +  fl+yl 

p2  -  u>l 

(10) 

0-7803-5988-7/00/$  10.00  ©  2000  IEEE 
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3.  UNDERDAMPED,  OVERDAMPED,  AND 
CRITICALLY  DAMPED  CASES 

We  now  explicitly  specialize  to  the  underdamped,  over¬ 
damped,  and  critically  damped  cases.  As  is  standard 
we  define  the  critical  frequency,  ujc,  for  these  three  cases 


2ujct  cosh(2wcf)  -  sinh(2a\4) 


Critically  Damped: 

(17) 

w— *0  3  |p | 


WC  =  \Jul~  p2 

p  <  UJo 

Underdamped 

UOc  =  \Jp2  -Wg 

p>  UJO 

Overdamped 

UJC  =  0 

p  =  UJo 

Critically  damped 

The  explicit  Wigner  distributions  are 
Underdamped: 

WM  =  2pK“(r)e“2'"  x 


sin(2(w  -  u>c)t)  sin(2(o;  +  ujc)t) 
Uj{uJ—UJc)  uj(uj+ujc) 


(11) 


Overdamped: 


W(t,uj)  =  pu(r)e  >T  x 


sin(2wr)  cosh(2wcr)  cos(2u;t)  sinh(2tucr) 

u{w2  +  uj l)  ujc{u2  +  uj2)  J  ^ 

Critically  Damped: 

,  N  1  .  ,  sin(2wT)  -  2wtcos(2wt) 

W(t,u)  =  W>u{T)e-^  -A - i— - i 

\P\  w  (13) 


In  the  above  solutions  there  are  singularities  at  some 
values  of  uj.  We  give  the  limits  at  those  singular  values 
for  the  three  cases: 

Underdamped: 


lim  W(t,uj)  = 

IjJ-*±Uc 


1 

2\/3\u>c 


u(r)e_2Mr 


4 ujct  -  sin(4a>cr) 


2w2 


(14) 


lim iW(t,aj)  =  w  u(t)e-2^ 

w-0  |p| 


sin(2wct)  —  2  u>ct  cos(2  wct) 


(15) 


Overdamped  : 

lim  W(t,u)  =  T 4r«(t)e-2'rt  x 

|p| 


4.  EXAMPLES 

We  now  give  some  examples  which  indicates  in  broad 
terms  the  behavior  of  the  solution.  In  a  subsequent 
publication  the  nature  of  the  solutions  will  be  studied 
in  detail.  For  all  the  examples  we  take  f3  =  1  and 
cj0  =  18.  We  then  vary  the  “damping”  coefficient  /j  to 
study  the  three  cases  typical  of  the  harmonic  oscillator. 

4.1.  Underdamped  Case 

We  compute  the  Wigner  in  Eq.  (11)  choosing  p  =  1. 
The  results  are  plotted  in  Fig.  1.  Several  important 
observations  can  be  made.  First  the  dashed  line  repre¬ 
sents  the  instantaneous  frequency  of  the  forcing  chirp, 
that  is  Wj(f)  =  fit.  The  chirp  is  concentrated  only  along 
this  line,  because  its  representation  in  the  Wigner  dis¬ 
tribution  domain  is  5{uj  —  /3t ).  The  gray  scale  image  is 
the  actual  Wigner  distribution  of  x(t).  For  every  fixed 
u  one  can  notice  that  the  Wigner  distribution  starts 
always  after  the  chirp.  Hence  the  Wigner  distribution 
reproduces  the  causal  behavior  of  the  harmonic  oscil¬ 
lator,  that  is,  the  system  gives  no  output  until  there  is 
no  input. 

This  feature  can  be  easily  understood  also  from 
(11),  where  the  step  function  u{t—uj/f3)  guarantees  this 
causal  behavior.  The  response  of  the  system  is  mainly 
concentrated  around  the  critical  frequency  u:c,  while 
it’s  weaker  at  all  the  other  frequencies.  This  happens 
because  the  classic  transfer  function  of  the  harmonic 
oscillator  has  a  peak  at  uj  =  ujc,  and  the  energy  of  the 
chirp,  that  is  almost  constant  over  the  entire  frequency 
axis  is  amplified  at  the  critical  frequency.  The  transfer 
function  goes  to  zero  for  uj  — >  oo,  and  that  is  why  the 
Wigner  distribution  goes  to  zero  for  w  »  uJc-  Also, 
observing  the  limit  (14)  at  w  —  ujc,  one  can  see  that 
the  Wigner  distribution  has  an  exponential  damping 
factor,  where  the  damping  coefficient  is  2/q  which  is 
twice  the  damping  of  the  free  oscillation  of  the  sys¬ 
tem.  Near  uj  =  0  the  Wigner  distribution  presents  the 
characteristic  cross  terms.  In  this  particular  case,  they 
are  generated  by  the  interference  of  the  energy  con¬ 
centrations  at  uj  =  ±o>c.  (In  the  plots  only  uj  >  0 
is  shown,  since  the  Wigner  distribution  is  symmetrical 
about  uj  =  0.). 
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Figure  1:  Underdamped  case:  fi  =  1.  The  Wigner  Figure  2:  Critically  damped  case:  /i  —  18.  We  plot  the 

distribution  in  (11)  is  plotted  taking  u> o  =  18  and  (3  —  analytic  Wigner  distribution  in  (13),  with  w0  and  P  as 

1.  The  gray  scale  image  is  the  actual  Wigner,  while  the  in  Fig.  1.  The  energy  concentrations  clearly  visible 

dashed  line  is  the  instantaneous  frequency  =  fit  in  the  underdamped  case  have  disappeared,  in  accor- 
of  the  input  chirp  /(f).  Note  the  energy  concentration  dance  with  the  anharmonic  behavior  of  the  oscillator 

around  the  resonant  frequency  uc  =  \J oJq  —  p?  «  loq.  in  the  critical  damping  case.  Also  the  cross  terms  have 

The  oscillating  terms  around  u>  —  0  are  the  cross  terms  disappeared, 

generated  by  the  symmetric  energy  concentrations  at 
uj  =  ±wc  (only  iv  >  0  is  shown). 

6.  APPENDIX 


4.2.  Critically  and  Overdamped  Case 

In  Fig.  2  and  3  we  plot  the  Wigner  distribution  for 
ft  =  18  and  p  =  30,  respectively  for  the  critically  and 
overdamped  cases.  The  two  images  are  very  similar, 
and  some  of  the  remarks  made  for  the  underdamped 
case  are  still  valid  here.  The  response  is  again  causal, 
but  no  energy  concentration  is  present  at  the  critical 
frequency  values.  This  is  in  accordance  with  the  an¬ 
harmonic  behavior  of  the  oscillator  for  these  damping 
values  which  is  well  known  [4].  Also  the  cross  terms 
near  w  =  0  disappear,  due  to  the  lack  of  the  energy 
concentrations.  Again  the  amplitude  of  the  Wigner 
distribution  varies  in  relation  to  the  modulus  of  the 
transfer  function  of  the  system. 


5.  CONCLUSION 

We  believe  that  we  have  effectively  achieved  the  ex¬ 
act  solution  to  the  gliding  tone  problem.  Our  method 
presents  a  new  perspective  to  studying  dynamical  prob¬ 
lems.  That  is,  instead  of  directly  seeking  the  solution 
for  the  dynamical  variable,  we  seek  directly  the  Wigner 
distribution  of  the  variable.  Remarkably,  sometimes 
that  is  easier  than  finding  the  solution  itself.  The  glid¬ 
ing  tone  problem  is  such  a  case. 


Using  the  notation 


we  rewrite  Eq.  (1)  as 


d-t 

dt 


[D2  +  2pD  +  u2]x(t)  =  f(t)  (19) 

and  then  we  factorize  the  differential  operator  acting 
on  x(t) 

[D  -  pi}[D  -  p2]x(t)  =  f{t) 

where 

Pi, 2  =  -P  ±  ~wl 

The  associated  Wigner  distribution  equation  to  the  prob¬ 
lem  (19)  is  [3] 

[A2  +  2pA  +ixiq]\B2  +  2  pB  +  (jjQ]Wx<x{t,u))  =  ) 


where 


Wu=6(u,-pt)  =  —8(t-Uf(i) 

The  equation  for  the  Wigner  distribution  can  be  fac¬ 
torized  as 

[A  ~Pi}[A -p2][B -pi][B  -  P2}WXiX(t,oj)  =  WfJ(t,oj) 

(21) 
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where  r  is  defined  as  in  (6). 

The  solution  to  the  other  equations  is  obtained  with 
the  same  technique.  Here  W2  and  W3  are  shown,  while 
Wx<x  is  reported  in  (5) 


W2  =  ~ — - — u(t) 
\(3\  Z2  -  Z\ 

Wa  = 


g— 2«i(t  _  e-2z2(r) 


|/3|  z2  -  zi 

- —  (e~2zi{T)  -  e“22l(r))  - 
- —  (e-2z2(r)  -e_2*l(T)) 


zi  -  z  1 

z\  -  z2 


(24) 

(25) 


(26) 


Figure  3:  Overdamped  case  with  p  =  30.  We  plot  the 
Wigner  distribution  in  (12),  with  ujq  and  (3  as  in  Fig.  1. 
Similar  considerations  to  Fig.  2  hold,  and  in  particular 
the  energy  is  mainly  concentrated  around  at  =  0. 


The  solution  is  obtained  by  first  transforming  the  equa¬ 
tion  in  the  equivalent  system 

[A  —  pi]  Wi  =  WfJ 
[A-P2)W2  =  Wi 
[B-Pl]W3  =  w2 
[B-P2]Wx,x  =  w3 


Substituting  for  A  and  B  and  collecting  the  terms  we 
have 


dm 

dt 


+  2z\W\ 


+  2z2W2 
at 


dW3 

dt 


+  2ziW3 


dWx<x 

dt 


+  2z2WXtx 


2  WfJ 
2Wi 
2  W2 
2  W3 


where  the  coefficients  zi,z2,z\,z2  are  defined  by  Eqs. 
(7).  The  equations  are  considered  as  ordinary  differen¬ 
tial  equations,  and  solved  setting  the  constant  of  inte¬ 
gration  to  zero.  In  a  forthcoming  publication  we  will 
prove  that  this  approach  is  equivalent  to  the  Wigner 
distribution  of  the  impulse  response  method  [6j. 

As  an  example  we  give  the  derivation  of  W%  from 
the  first  equation 


l 

Wi=e-2zit  J  e2zit'6(t'  -w//3)dt' 


m 


u(r)e 


-2zi(t) 


(22) 

(23) 


When  solving  those  equations,  the  evaluation  of  the 
following  integral  is  always  encountered 


t 


J  u{t '  -  u/fie^-^’dt'  = 

—  OO 

(27) 

t 

u(t-u/(3)  f  = 

(28) 

e2(z2-zi)t  _  e2{z2-zi)u>/0 

(29) 

“(t  “//J)  2 («-*,) 
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ABSTRACT 

In  processes  with  many  lines,  such  as  those  encountered 
in  climate,  space  physics,  and  communications  systems, 
it  is  often  found  that  the  underlying  base  spectrum1 
also  has  a  complicated  shape.  In  this  paper  I  discuss 
some  methods  to  estimate  the  base,  and  detail  spectra 
by  using  the  probability  distributions  of  such  spectra. 
The  methods  include  a  robust  estimate  of  the  central 
part  of  a  mixture  of  central  and  noncentral  chi-square 
distributions  and,  second,  a  direct  estimate  of  the  mix¬ 
ing  and  noncentrality  parameters  made  by  minimiz¬ 
ing  the  Kolmogorov-Smirnov  D  statistic  between  the 
empirical  and  theoretical  cumulative  distribution  func¬ 
tions. 

1.  INTRODUCTION 

The  problem  considered  in  this  paper  is  that  of  ana¬ 
lyzing  the  incoherent  spectrum  of  data  when  there  are 
many,  possibly  thousands,  of  lines  within  the  Nyquist 
band.  Typically  these  are  not  strict  sinusoids,  but  have 
unknown  narrow-band  modulations.  The  problems  of 
interest  are  separating  the  “base”  and  “detail”  spectra, 
estimating  the  number  of  lines,  and  the  power  in  these 
lines.  The  base  spectrum  is  unknown  and  assumed  to 
have  a  moderately  complicated,  but  smooth,  shape. 

In  the  problems  considered  here,  the  basic  data  is 
assumed  to  consist  of  spectrum  estimates  over  a  range 
of  frequencies  that,  apart  from  an  overall  “red”  shape, 
have  a  mixture  of  central  and  non-central  xt  distribu¬ 
tions.  v,  the  degrees-of-freedom,  is  known  and  the  scale 
factors  for  the  two  components  of  the  mixture  are  as¬ 
sumed  to  be  the  same.  The  mixing  fraction,  scale,  and 
noncentrality  parameters  are  to  be  estimated.  Two 
estimation  procedures  are  described  here.  The  first 
is  a  simple  robust  estimate  made  by  taking  a  scaled 
quantile  of  the  estimates  in  a  given  frequency  range. 
The  second  procedure  uses  a  goodness-of-fit  test  and 

*1  have  termed  the  two  parts  of  the  spectrum  base  and  detail 
as  opposed  to,  for  example,  fit  and  residual.  This  is  to  emphasize 
that  the  two  components  are  both  of  interest,  the  base  varying 
more  slowly  in  frequency  than  the  detail.  In  addition,  “residual” 
is  often  dismissed  as  “noise”  and  usually  implies  subtraction,  not 
division. 


chooses  the  parameters  that  minimize  the  misfit  be¬ 
tween  the  observed  and  theoretical  cumulative  distri¬ 
bution  functions.  The  Kolmogorov-Smirnov  D  statistic 
is  used  as  a  measure  of  misfit  in  these  examples. 

An  example  of  such  a  problem  is  that  of  solar  g- 
modes  in  the  interplanetary  magnetic  field  (IMF).  A 
few  years  ago  in  Thomson  et  al.  (1995),  we  proposed 
that  the  "observed  fluctuations  of  energetic  particles  in 
the  solar  wind  were  a  result  of  solar  g-  (gravity)  and 
p-  (pressure)  modes  thus  contradicting  the  prevailing 
opinion  that  the  fluctuations  were  from  turbulence.  In 
theory,  p-modes  have  extremely  high  Q’s,  ~  1011,  Bah- 
call  and  Kumar  (1993),  but  whether  the  frequencies  are 
stable,  or  modulated  by  the  solar  cycle,  is  unknown. 
In  either  case  the  high  Q' s  imply  that  the  modes  loose 
very  little  energy  either  to  dissipation  or  radiation  so 
that  detection  is  not  simple.  Because  magnetic  fields 
are  fundamental  in  space  physics,  establishing  the  pres¬ 
ence  and  characteristics  of  modes  in  the  IMF  is  cru¬ 
cial  for  proper  understanding.  The  data  used  here  are 
one-hour  averages  of  the  normal  component  of  the  in¬ 
terplanetary  magnetic  field  measured  by  the  Ulysses 
spacecraft,  Balogh  et  al.  (1992).  These  were  measured 
while  Ulysses  was  near  the  ecliptic  plane  between  day 
298  of  1990,  just  after  launch,  until  day  33  of  1992,  just 
before  the  spacecraft’s  Jupiter  encounter,  The  11,157 
hourly  measurements  span  a  radial  distance  of  1  to  5.3 
Astronomical  Units.  The  data  were  multiplied  by  he¬ 
liocentric  distance  to  remove  the  radial  dependency. 

2.  DATA  ASSUMPTIONS 

I  assume  that  the  data  contains  narrow  band  quasi- 
deterministic  components,  typically  sinusoidal  compo¬ 
nents  that  have  slow  amplitude  or  phase  modulations.2 
Denote  such  a  modulation  shape  by  Mm(t),  standard- 

2In  addition  to  the  unknown  solar  processes  involved  in  trans¬ 
ferring  small  mechanical  motions  in  the  core  of  the  sun  through 
the  convection  zone,  photosphere,  and  corona  into  the  interplane¬ 
tary  magnetic  field,  the  spacecraft’s  motion  along  its  orbit  causes 
additional  confounding  effects.  Because  the  modal  amplitudes 
are  spherical  harmonics,  the  observed  amplitude  will  depend  on 
both  the  spacecraft’s  heliographic  latitude  and  radius  and  its 
changing  velocity  will  give  a  Doppler  shift. 
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Prewhitened  Spectrum 


ized  to  have  unit  energy, 

£  |M„(()|2  =  1  (1) 

t=0 

and  assume  that  the  process  is  of  the  form 

x(t)  =  <(t)  +  £  (2) 

m,n 

where  £(f)  is  a  nondeterministic  process  with  a  rela¬ 
tively  slowly- varying  spectrum.  I  assume  that  £(t)  is 
independent  of  the  line  components.  There  are  an  un¬ 
known  number  of  modulation  signals  and  a  large  num¬ 
ber  of  signal  frequencies  /m,„. 

Sri 


Frequency,  MicroHertz 


Figure  1:  A  portion  of  the  estimated  detail  spectrum, 
S(f),  of  the  Ulysses  normal  magnetic  field.  The  spectrum 
has  been  levelled,  by  the  estimate  shown  in  Figure  4,  to 
have  a  unit  base  power  level.  There  are  20  peaks  above  the 
99%  level,  cumulatively  7.9%  of  the  estimates  compared  to 
4.9%  below  the  5%  level. 


3.  MULTITAPER  ESTIMATES 

In  the  usual  formulation  of  multitaper  spectrum  esti¬ 
mation,  Thomson  (1982),  one  is  given  a  sample  of  N 
equally-spaced  data  x(t).  One  chooses  a  bandwidth  W 
from  exploratory  data  analysis  and,  in  this  case,  from 
the  theoretical  spacings  of  g-modes,  Guenther  et  al. 
(1992).  One  then  computes  the  K  =  2 NW  Slepian 


sequences,  v*  (N,  W),  or  discrete  prolate  spheroidal  se¬ 
quences,  Slepian  (1978),  and  the  windowed  Fourier  trans¬ 
forms  or  eigencoefficients, 

Vk(f)  =  £  e-^v^iN,  W)x(t)  .  (3) 

t=o 

I  used  a  time-bandwidth  product  NW  =  6  and  K  =  10 
windows.  Because  the  higher-order  coefficients,  those 
for  k  <  K  - 1,  are  more  susceptible  to  broad  band  bias 
than  those  for  small  k,  one  forms  an  estimate,  £*(/),  °f 
the  ideal  eigencoefficients  that  would  be  obtained  if  the 
frequency  band  (/  —  W,  f  +  W)  could  be  observed  in 
isolation.  (See  Thomson  §V  of  1982  or  Thomson  §3.3  of 
2000  for  details.)  The  canonical  multitaper  spectrum 
estimate  is  then 


£(/)  =  4=X>*(/)iJ-  (*) 

*  fc=0 

The  eigencoefficients  of  the  unknown  modulation  sig¬ 
nals  Mm(t), 


Mm,fc(/)  =  £t;t(fc)Mm(t)e^,  (5) 
t= o 

become,  in  the  frequency  domain,  a  convolution  of  the 
Fourier  transforms  of  Mm(t)  with  the  Slepian  functions 
so  the  apparent  bandwidth  of  a  line  will  increase  be¬ 
yond  the  2  W  of  the  Slepian  functions  by  the  bandwidth 
of  the  modulating  signal.  Thus  the  assumption  that  the 
Mm’s  are  narrow-band  can  be  checked  by  testing  that 
the  observed  linewidths  are  close  to  ±W.  A  histogram 
of  the  peak  widths  was  made,  Figure  2.  Peaks  exceed¬ 
ing  the  95%  significance  level  of  the  xio  distribution 
in  the  detail  spectrum,  Figure  1,  were  used,  with  the 
widths  measured  at  the  90%  point.  This  histogram 
has  a  sharp  peak  located  close  to  the  width  expected 
with  pure  sinusoids,  so  one  can  conclude  that  almost 
all  the  energy  of  a  line  at  frequency  /  is  contained  in 
the  frequency  band  (/  —  W,  f  +  W). 

Now  consider  a  multitaper  spectrum  estimated  at 
one  of  these  frequencies,  say  fa  =  fm,n •  The  eigenco¬ 
efficients  are 


Vk(fo)  —  Ck(fo)  +  fj’m,n^m,k  (6) 


and  the  simple  spectrum  estimate  Sx(fo)  will  have  a 
non-central  chi-square  distribution  with  2 K  degrees- 
of-freedom.  If  one  denotes  the  spectrum  of  the  base 
process  by  S<(/)  and  defines  the  detail  spectrum  by 


S{f)  = 


SJJ) 

sdf) 
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Figure  2:  Histogram  of  peak  widths  from  the  full  0  to  140 
pHz  frequency  range.  The  dotted  vertical  lines  are  the  base 
width,  2 W  as  0.29/iHz  of  the  spectral  window  and,  0.23^<Hz, 
its  approximate  width  where  the  measurements  are  made. 

then,  between  lines,  2 KS(f)  will  have  a  central  chi- 
square  distribution  with  v  =  2K  degrees-of-freedom. 
Similarly,  at  line  frequencies,  the  distribution  will  be 
non-central  x^k-  Using  (1)  and  the  narrow-band  prop¬ 
erties  of  the  modulation,  the  non-centrality  parameter 
is 

>  _  2-ftT |pm,n|2 

(7) 

Thus,  depending  on  whether  a  frequency  /  happens  to 
fall  on  a  line  or  between  lines,  the  estimated  spectrum 
Sx(f)  will  have  either  a  central  or  noncentral  xl  dis¬ 
tribution  and,  overall,  a  mixture. 

There  are,  consequently,  two  distinct  problems:  first 
estimating  the  baseline  spectrum,  S^(f0)  and;  second 
estimating  the  non-centrality  parameter,  A. 

It  should  be  emphasized  that  the  base  spectrum 
must  be  removed  before  distribution  tests  are  attempted 
because,  if  not,  the  usual  red  spectral  shape  simply 
causes  the  distribution  tests  to  be  unreliable.  Because 
noncentral  chi-square  distributions  may  be  approximated 
with  mixtures  of  central  chi-squares,  Johnson  et  al. 
(1995,  Ch.  29),  the  mixture  resulting  from  an  incor¬ 
rectly  estimated  base  spectrum  may  thus  be  mistaken 
for  a  noncentral  distribution.  Further,  if  the  base  spec¬ 
trum  is  allowed  to  vary  too  rapidly,  moderate  peaks  can 
be  suppressed  so  the  noncentral  component  will  not  be 
detected.  Consequently,  both  false  detection  and  rejec¬ 
tion  failures  are  possible. 


It  can  be  asked  why  maximum-likelihood  estimates 
are  not  used  here,  and  there  are  several  answers.  First, 
MLE’s  for  noncentral  chi-square  distributions  are  at 
both  more  complicated  and  can  have  poorer  perfor¬ 
mance  than  moment  estimates,  Alam  and  Saxena  (1982). 
Second,  MLE’s  of  chi-square  mixtures  do  not  appear 
to  have  been  studied.  Third,  the  computation  burden 
of  the  MLE’s  is  at  least  as  severe  as  that  of  the  direct 
goodne8s-of-fit  tests.  Fourth,  the  robust  estimate  works 
when  there  are  several  noncentral  distributions  in  the 
mixture,  whereas,  with  a  MLE,  the  number  of  different 
distributions  would  have  to  be  known  or  estimated. 

4,  A  SIMPLE  ROBUST  ESTIMATE 

Suppose  we  have  a  set  of  J  samples  of  a  mixture  dis¬ 
tribution,  Sj,  j  =  1 ,...,  J.  The  distribution  of  the 
individual  sj’s  is,  with  probability  (1  —  e),  central  xl 
or,  with  probability  e,  non-central  x„2(A).  All  the  Sj’s 
have  a  common  unknown  scale,  a,  so  the  central  com¬ 
ponent  is  the  same  in  either  case,  v,  is  known  so  the 
probability  density  function  for  a  =  1  is 

pe(s|i/,  A,  e)  =  (1  -  e)pc(s|i/)  +  ep„c(s|i/,  A)  (8) 

where  pc  and  pnc  denote  the  standard  xl  central  and 
non-central  densities  respectively,  Lancaster  (1969).  The 
corresponding  cumulative  distributions,  P,  and  quan¬ 
tiles,  Q,  are  similarly  denoted, 

Qc(p) 

pe{s\v)ds  =  Pc(Qc(p)\v)  =  p  . 

The  expected  value  of  the  mixture  is 

Esj  =  a{(l  -  e)i/  +  e(u  +  A)}  =  a{v  +  eA}  (9) 

and  the  problem  is  to  estimate  a,  e,  and  A. 

Robust  estimates  of  x 2  distributions  do  not  appear 
to  be  a  well-studied  problem.  For  example,  neither 
“robust”  nor  “non-central”  is  indexed  in  Bowman  and 
Shenton  (1988).  Consider  a  simple  estimate  based  on 
order  statistics:  Denote  the  sorted  observations  by  s ^ 
with 

«(1)  <  «( 2)  <  •  • '  <  S(j) 

and  consider  an  estimate  based  on  the  pth  quantile, 
Qc(p > v)  of  the  central  distribution.  FYom  the  empirical 
cumulative  probability  distribution  of  the  jth  sample 
point,  pj  =  (2 j  —  1)/(2J),  define  a  scale  factor  0j  = 
l/Qc(j>j,v)  so  the  estimate  is 

a(j)  =  ■  (10) 

For  given  e  and  A  and  taking  Qt  ( j )  as  the  pth  quantile  of 
the  mixture  distribution,  the  variance,  using  methods 
from  Kendall  and  Stuart  (1963),  is 
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Figure  3:  The  mean-squared-error  for  a  X20  mixture  with 
various  values  of  e  and  A  vs  the  probability  level.  The  min¬ 
imum  MSE  for  cases  of  interest  typically  occurs  for  p  in  the 
5  to  10  percent  range. 


Var{a(y)}  « 


Pj(l~Pj>2 

^(Qe(i))Q,0)2 


and,  similarly,  the  bias  is  0jQt(j)  -  v.  Evaluating  the 
mean-square-error,  Figure  3,  one  finds  that,  for  typical 
values  of  e  and  A,  the  optimum  quantile  is  rather  low, 
usually  between  the  5  and  10%  points.  By  most  stan¬ 
dards  this  is  surprising,  but,  on  considering  that  much 
less  work  has  been  done  on  robust  estimates  in  nonsym- 
metric  distributions  than  on  the  symmetric  case,  less 
so  than  at  first  glance.  Earlier  work,  Thomson  (1977, 
Part  II,  §V),  showed  similar  robust  estimates  for  stan¬ 
dard  x2  estimates  that  also  trimmed  approximately  the 
top  third  of  the  population,  but  with  the  primary  mo¬ 
tivation  of  eliminating  spectra  estimated  from  contam¬ 
inated  data,  not  for  estimating  lines.  They  were,  how¬ 
ever,  see  e.g.  Figures  7,  8,  and  11  of  Thomson  (1977), 
also  effective  for  finding  lines  that  simple  section  aver¬ 
ages  missed. 


Figure  4:  The  base  spectrum,  S((f),  estimated  by  tak¬ 
ing  an  initial  AR-15  spectrum  times  a  smoothed  running 
quantile  estimate.  The  details  are  statistically  significant. 


of  S((/).  Figure  1  is  a  plot  of  the  detail  spectrum, 
S(f)  =  S^(/)/§c(/)-  One  implication  of  the  lowest 
mean-square-error  occurring  near  the  5%  point  of  the 
distribution  implies  that  moderately  large  samples  are 
required  to  apply  this  method3.  Exploratory  data  anal¬ 
ysis  with  various  estimation  spans,  J,  or,  in  bandwidth, 
(J  —  1)A/,  showed  that,  for  this  data  J  oc  2.3/tHz  was 
a  reasonable  size.  This  was  determined  by  requiring 
that,  on  average,  the  variation  in  S{f),  as  measured  by 
the  ratio  of  the  90%  to  10%  points,  does  not  drop  be¬ 
low  that  expected  for  a  central  xl-  The  range  between 
the  10%  and  90%  points  does  not  depend  strongly  on 
sample  size,  so  testing  for  overfitting  the  base  spec¬ 
trum  was  relatively  easy.  Figure  4  shows_an  estimate, 
The  original  spectrum  estimate,  Sx(f),  was  di¬ 
vided  by  this  base  estimate  to  get  the  detail  spectrum 
part  of  which  is  shown  in  Figure  1.  The  ripples  in  the 
base  spectrum  are  large  enough  to  rule  out  the  simple 
power-law  spectrum  predicted  by  turbulence  theory. 


5.  ESTIMATING  THE  NOISE  SPECTRUM 

The  base  spectrum  <%(/)  was  estimated  by,  first,  di¬ 
viding  the  raw  spectrum  estimate,  Sx(f),  with  an  au¬ 
toregressive  fit,  Sar(f),  to  obtain  an  approximately 
white  intermediate  spectrum  S,(/)  =  Sx(f)/Sar(f)- 
Next,  the  robust  estimate  (10)  was  slid  alongjSi(/)  to 
get  an  estimate  of  the  central  component,  Sc(f ).  Fi¬ 
nally,  one  takes  Sar(f)  x  &(/)  as  a  robust  estimate 


6.  FITS  TO  DISTRIBUTION 

Fitting  a  mixture  density  is  a  nonlinear  procedure.  Be¬ 
cause  probability  density  functions  must  be  positive, 
even  partial  linearization  for  e  can  give  unacceptable 
results.  The  procedure  adopted  was  to  use  Brent,  ’s 
1973  algorithm,  FMIN,  to  minimize  the  misfit  between 

Estimates  at  several  probability  levels,  using  (10),  were  made 
and  compared,  and  a  smoothed  average  of  those  at  the  5  and  10% 
levels  was  used. 
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Figure  5:  Empirical  and  the  best  fitting  mixture  cumula¬ 
tive  distributions  for  the  detail  spectrum  shown  in  Figure  1. 
The  maximum  discrepancy  is  D  =  0.0191,  near  the  center 
of  the  distribution.  The  lower  panel  shows  a  histogram  of 
the  data,  the  central  and  noncentral  probability  densities, 
and  their  sum,  all  scaled  by  sample  size.  The  estimated 
mean  of  the  central  distribution  is  at  a  level  of  0.8,  showing 
that  the  base  estimate  was  about  20%  too  high. 

the  empirical  cdf,  (j  -\)/J  and  the  theoretical  cdf, 
Pt(8j\u,e,  A).  For  trial  values  of  e  and  A  the  scale,  a, 
was  estimated  by  equating  the  theoretical  mean  (9) 
to  the  observed  average4.  A  goodness-of-fit  test,  the 
Kolmogorvov-Smirnov  D  statistic  was  used  as  a  mea- 
sure  of  misfit.  Figure  5  shows  this  procedure  applied 
to  the  detail  spectrum  of  Figure  1.  The  fit  is  extremely 
good  and,  in  this  example,  the  estimated  non-centrality 
parameter  is  A  ~  0.8i/  c  w  0.36  and  about  30  percent 
of  the  total  power  is  in  the  non-central  component. 
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ABSTRACT 

Multiple  sensor  arrays  distributed  over  a  planar  region 
provide  the  means  for  highly  accurate  localization  of  the 
(a:,  y)  position  of  a  source.  In  some  applications,  such  as  mi¬ 
crophone  arrays  receiving  aeroacoustic  signals  from  ground 
vehicles,  random  fluctuations  in  the  air  lead  to  frequency- 
selective  coherence  of  the  signals  that  arrive  at  widely  sep¬ 
arated  arrays.  We  present  a  performance  analysis  for  local¬ 
ization  of  a  wideband  source  using  multiple  sensor  arrays. 
The  wavefronts  are  modeled  with  perfect  spatial  coherence 
over  individual  arrays  and  with  frequency-selective  coher¬ 
ence  between  distinct  arrays.  The  sensor  signals  are  mod¬ 
eled  as  wideband  Gaussian  random  processes,  and  we  study 
the  Cramer-Rao  bound  (CRB)  on  source  localization  accu¬ 
racy  for  varying  levels  of  signal  coherence  and  for  process¬ 
ing  schemes  with  different  levels  of  complexity.  We  show 
that  significant  improvements  in  source  localization  accu¬ 
racy  are  possible  when  partial  signal  coherence  from  array 
to  array  is  exploited.  Further,  we  show  that  a  distributed 
processing  scheme  involving  bearing  estimation  at  the  in¬ 
dividual  arrays  and  time-delay  estimation  between  pairs  of 
sensors  performs  nearly  as  well  as  the  optimum  scheme  that 
jointly  processes  the  signals  from  all  sensors.  Results  based 
on  measured  aeroacoustic  data  are  included  to  illustrate 
frequency-selective  signal  coherence  at  distributed  arrays. 

1.  INTRODUCTION 

We  are  concerned  with  estimating  the  location  (x3,y,)  of  a 
wideband  source  using  multiple  sensor  arrays  that  are  dis¬ 
tributed  over  an  area.  We  consider  schemes  that  distribute 
the  processing  between  the  individual  arrays  and  a  fusion 
center  in  order  to  limit  the  communication  bandwidth  be¬ 
tween  arrays  and  fusion  center.  Triangulation  is  a  standard 
approach  for  source  localization  with  multiple  sensor  arrays. 
Each  array  estimates  a  bearing  and  transmits  the  bearing  to 
the  fusion  center,  which  combines  the  bearings  to  estimate 
the  source  location  (x3,ys).  Triangulation  is  characterized 
by  low  communication  bandwidth  and  low  complexity,  but 
it  ignores  coherence  that  may  be  present  in  the  wavefronts 
that  are  received  at  distributed  arrays.  In  this  paper,  we 
investigate  new  approaches  for  source  localization  with  mul¬ 
tiple  arrays  that  exploit  partial  coherence  of  the  wavefronts 
at  distributed  arrays.  We  show  that  the  Cramer-Rao  lower 
bound  (CRB)  on  estimating  the  source  location  is  signifi¬ 
cantly  reduced  when  coherence  from  array  to  array  is  ex¬ 
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ploited.  We  also  evaluate  the  performance  of  suboptimum 
source  localization  methods  that  employ  distributed  pro¬ 
cessing  to  reduce  the  communication  bandwidth  between 
the  arrays  and  the  fusion  center.  Results  are  presented  from 
processing  measured  aeroacoustic  data  to  illustrate  signal 
coherence  at  distributed  arrays. 

Previous  work  on  source  localization  with  aeroacoustic 
arrays  has  focused  on  angle  of  arrival  estimation  with  a 
single  array  [1].  The  problem  of  imperfect  spatial  coher¬ 
ence  in  the  context  of  narrowband  angle-of-arrival  estima¬ 
tion  with  a  single  array  has  been  studied  in  [2]-[5].  Pau- 
raj  and  Kailath  [2]  presented  a  MUSIC  algorithm  that  in¬ 
corporates  the  nonideal  spatial  coherence,  assuming  that 
the  coherence  variation  is  known.  Gershman  et  al.  [3]  pro¬ 
vided  a  procedure  to  jointly  estimate  the  spatial  coherence 
loss  and  the  angles  of  arrival.  Song  and  Ritcey  [4]  provide 
maximum-likelihood  (ML)  methods  for  estimating  the  pa¬ 
rameters  of  a  coherence  model  and  the  angles  of  arrival, 
and  Wilson  [5]  incorporates  physical  models  for  the  spatial 
coherence.  The  problem  of  decentralized  array  processing 
has  been  studied  in  [6]- [8].  Wax  and  Kailath  [6]  present 
subspace  algorithms  for  narrowband  signals  and  distributed 
arrays,  assuming  perfect  spatial  coherence  across  each  array 
but  neglecting  the  spatial  coherence  between  arrays.  We¬ 
instein  [7]  presents  performance  analysis  for  pairwise  pro¬ 
cessing  the  wideband  sensor  signals  from  a  single  array  and 
shows  negligible  loss  in  localization  accuracy  when  the  SNR 
is  high.  Stoica,  Nehorai,  and  Soderstrom  [8]  consider  ML 
angle  of  arrival  estimation  with  a  large,  perfectly  coherent 
array  that  is  partitioned  into  subarrays. 

2.  DATA  MODEL 

A  model  is  formulated  in  this  section  for  the  signals  re¬ 
ceived  by  the  sensors  in  distributed  arrays.  Consider  a  sin¬ 
gle  source  that  is  located  at  coordinates  (a;,,  ys)  in  the  (x,  y) 
plane.  Then  H  arrays  are  distributed  in  the  same  plane, 
as  illustrated  in  Figure  1.  The  signals  measured  at  the 
distributed  sensor  arrays  are  modeled  as  jointly  Gaussian 
wideband  random  processes.  The  model  is  very  general,  and 
it  accounts  for  propagation  effects  between  the  source  and 
the  distributed  arrays,  including  frequency-selective  spatial 
coherence  and  different  signal  power  levels  received  at  each 
array.  The  spatial  coherence  of  the  wavefronts  is  modeled 
as  being  perfect  over  each  individual  array  but  variable  be¬ 
tween  distinct  arrays.  This  idealization  allows  us  to  study 
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the  effect  of  varying  coherence  between  arrays  on  source  lo¬ 
calization  accuracy.  Physical  modeling  of  frequency  selec¬ 
tive  coherence  is  discussed  in  [9].  The  power  spectral  den¬ 
sity  of  the  source  is  arbitrary,  allowing  a  range  of  cases  to 
be  modeled  including  narrowband  sources  and  sums  of  har¬ 
monics,  as  well  as  wideband  sources  with  continuous  power 
spectra. 

Each  array  h  €  {1, . . . ,  H}  contains  Nh  sensors,  and  has 
a  reference  sensor  located  at  coordinates  ( Xh ,  «//,).  The  loca¬ 
tion  of  sensor  n  e  {1, . . . ,  Nh}  is  at  (ih  +  Axhn,  yh  +  Ayhn), 
where  (A Xhn,  At/h„)  is  the  relative  location  with  respect  to 
the  reference  sensor.  If  c  is  the  speed  of  propagation,  then 
the  propagation  time  from  the  source  to  the  reference  sensor 
on  array  h  is 


Th  =  ~  =  l  [(*»  -  xh)2  +  {y*  -  yh?}11 2 .  (1) 

We  will  assume  that  the  wavefronts  are  well  approximated 
by  plane  waves  over  the  aperture  of  individual  arrays.  The 
propagation  time  from  the  source  to  sensor  n  on  array  h 
will  be  expressed  by  rh  +  Thn,  where 

Thn  a  _I[^AZ£iAa:hn  +  lt=^.Ayhn] 
c  rh  rh  J 

=  "  [(cos<£h)A:rhn  +  (sin0h)Ayhn] ,  (2) 


they  describe  the  distribution  of  average  signal  power  with 
frequency.  The  model  allows  the  average  signal  power  to 
vary  from  one  array  to  another.  Indeed,  the  PSD  may  even 
vary  from  one  array  to  another  to  reflect  propagation  differ¬ 
ences,  source  aspect  angle  differences,  and  other  effects  that 
lead  to  coherence  degradation  in  the  signals  at  distributed 
arrays. 

Let  us  elaborate  the  definition  and  the  meaning  of  co¬ 
herence  between  the  signals  sg(t)  and  Sh(t)  received  at  dis¬ 
tinct  arrays  g  and  h.  In  general,  the  cross-spectral  density 
function  (5)  can  be  expressed  in  the  form 

Ga}gh{ui)  =  7s,gh(ui)  [G,i99(a))GS|hh(w)]1^2  ,  (6) 

where  7,,9h(w)  is  the  spectral  coherence  function,  which  has 
the  property  0  <  \"(a,gh(u))\  <  1.  The  coherence  function 
7s,gh(oj)  is  generally  complex- valued,  but  we  will  model  it 
as  real-valued.  This  is  a  reasonable  assumption  for  acoustic 
propagation  environments  in  which  the  loss  of  coherence  is 
due  to  random  changes  in  the  apparent  source  location,  as 
long  as  the  change  in  apparent  source  location  is  the  same 
at  both  arrays  g  and  h  [5,  9], 

We  model  the  signal  received  at  sensor  n  on  array  h  as 
a  sum  of  the  delayed  source  signal  and  noise, 

2hn(0  =  sh(t  -Th-  Thn)  +  Whn(t),  (7) 


where  Thn  is  the  propagation  time  from  the  reference  sensor 
on  array  h  to  sensor  n  on  array  h,  and  <f>h  is  the  bearing  of 
the  source  with  respect  to  array  h.  Note  that  while  the  far- 
field  approximation  (2)  is  reasonable  over  individual  array 
apertures,  the  wavefront  curvature  that  is  inherent  in  (1) 
must  be  retained  in  order  to  accurately  model  the  (possibly) 
wide  separation  between  arrays. 

The  time  signal  received  at  sensor  n  on  array  h  due  to 
the  source  will  be  represented  as  sh(t-Th  -  Th„),  where  the 
vector  of  signals  s(t)  =  (si(t), . . . ,  sH(t)]T  received  at  the 
H  arrays  are  modeled  as  real- valued,  continuous-time,  zero- 
mean,  wide-sense  stationary,  Gaussian  random  processes 
with  — oo  <  t  <  oo.  These  processes  are  fully  specified 
by  the  H  x  H  cross-correlation  function  matrix 

R,(t)  =  E{s(t+T)s{t)T},  (3) 

where  E  denotes  expectation,  superscript  T  denotes  trans¬ 
pose,  and  we  will  later  use  the  notation  superscript  *  and 
superscript  H  to  denote  complex  conjugate  and  conjugate 
transpose,  respectively.  The  (g,  h)  element  in  (3)  is  the 
cross-correlation  function 

r*,gh{T)  —  E{sg(t  +  t)  Sh(t)}  (4) 

between  the  signals  received  at  arrays  g  and  h.  The  corre¬ 
lation  functions  (3)  and  (4)  are  equivalently  characterized 
by  their  Fourier  transforms,  which  are  the  cross-spectral 
density  functions  and  matrix 

/OO 

<"sr9h(r)  exp(— joit)  dr 

■OO 

G .(«)  =  ^(R3(t)}.  (5) 

The  diagonal  elements  G3|hh(u>)  of  (5)  are  the  power  spec¬ 
tral  density  (PSD)  functions  of  the  signals  Sh(t),  and  hence 


where  the  noise  signals  Whn(t)  are  modeled  as  real-valued, 
continuous-time,  zero-mean,  wide-sense  stationary,  Gaus¬ 
sian  random  processes  that  are  uncorrelated  at  distinct  sen¬ 
sors.  The  noise  correlation  properties  are 

E{  wgm  (t  +  T)whn(t)}  =  rw(r)  SghS  mnj  (8) 

where  r«,(r)  is  the  noise  autocorrelation  function,  and  the 
noise  power  spectral  density  is  Gw(u>)  =  F{rw(T)}.  We 
then  collect  the  observations  at  each  array  h  into  Nh  x  1 

vectors  zh{t)  =  [zhi(t), . . . ,  zh,Nh  (t)]T  for  h  =  1 . H,  and 

we  further  collect  the  observations  from  the  H  arrays  into 
a  {Ni  +  •  ■  •  +  Nh)  x  1  vector 

Z(t)  =  [Zl(t)T  ...  z  H(t)T]T.  (9) 

The  elements  of  Z(t)  in  (9)  are  zero-mean,  wide-sense  sta¬ 
tionary,  Gaussian  random  processes.  We  can  express  the 
cross-spectral  density  matrix  of  Z(t)  in  a  convenient  form 
with  the  following  definitions.  The  array  manifold  for  array 
h  at  frequency  u>  is 


ah(u>)  = 


exp(— jwTfci) 


(10) 


L  exp (-jwTh,Nh)  J 

exp  ((cos  <f>h)Axhi  +  (sin  <£/,)Aj/hi )] 
exp  [i“  ((cos<^h)Aa;h,jvh  +  (sin  <t>h)Ayh,Nh )] 


using  Thn  from  (2)  and  assuming  that  the  sensors  have  om¬ 
nidirectional  response  to  sources  in  the  plane  of  interest. 
Let  us  define  the  relative  time  delay  of  the  signal  at  arrays 
g  and  h  as  Dgh  =  rg  —  Th  ,  where  Th  is  defined  in  (1).  Then 
the  cross-spectral  density  matrix  of  Z(t)  in  (9)  has  the  form 
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Figure  1:  Geometry  of  source  location  and  H  distributed  sensor  arrays.  A  communication  link  is  available  between  each 
array  and  the  fusion  center. 


shown  in  (11)  in  Figure  2.  Recall  that  the  source  cross- 
spectral  density  functions  Ga,9h(u>)  in  (11)  can  be  expressed 
in  terms  of  the  spectral  coherence  ts,gh(u)  using  (6). 

Note  that  (11)  depends  on  the  source  location  parame¬ 
ters  (xa,ya)  through  ah(o>)  and  Dgh ■  However,  (11)  points 
out  that  the  observations  are  also  characterized  by  the  bear¬ 
ings  <fn , . . . ,  4>h  to  the  source  from  the  individual  arrays  and 
the  relative  time  delays  D9h  between  pairs  of  arrays.  There¬ 
fore,  one  way  to  estimate  the  source  location  (xa,  y, )  is  to 
first  estimate  the  bearings  <j>i , . . . ,  (j>H  and  the  pairwise  time 
delays  Dgh- 

3.  CRBS  ON  LOCALIZATION  ACCURACY 

The  problem  of  interest  is  to  estimate  the  source  location 
parameter  vector  0  =  [ar3,  y,]T  using  T  samples  of  the  sen¬ 
sor  signals  Z(0),  Z(Ta), . . . ,  Z((T  —  1)  •  T, ) ,  where  Ta  is  the 
sampling  period.  Let  us  denote  the  sampling  rate  by  /,  = 
1/7,  and  w,  =  2rr /,.  We  will  assume  that  the  continuous¬ 
time  random  processes  Z(t)  are  band-limited,  and  that  the 
sampling  rate  fa  is  greater  than  twice  the  bandwidth  of 
the  processes.  Then  Friedlander  [10]  has  shown  that  the 
Fisher  information  matrix  (FIM)  J  for  the  parameters  0 
based  on  the  samples  Z(0),  Z(TS), . . . ,  Z((T  —  1)  •  Ta)  has 
elements  Jij  shown  in  (12)  in  Figure  2.  The  CRB  matrix 
C  =  J_1  then  has  the  property  that  the  covariance  ma¬ 
trix  of  any  unbiased  estimator  ©  satisfies  Cov(0)  —  C  >  0, 
where  >  0  means  that  Cov(0)  -  C  is  positive  semidefinite. 
The  CRB  provides  a  lower  bound  on  the  performance  of 
any  unbiased  estimator.  Equation  (12)  provides  a  conve¬ 
nient  way  to  compute  the  FIM  for  the  distributed  sensor 
array  model.  It  provides  a  powerful  tool  for  evaluating  the 
impact  that  various  parameters  have  on  source  localization 
accuracy.  Parameters  of  interest  include  the  spectral  coher¬ 
ence  between  distributed  arrays,  the  signal  bandwidth  and 
power  spectrum,  the  array  placement  geometry,  and  the 
SNR.  The  FIM  in  (12)  is  not  easily  evaluated  analytically, 
but  it  is  readily  evaluated  numerically  for  cases  of  interest. 
The  FIM  expression  (12)  can  be  specialized  for  two  impor¬ 
tant  cases.  With  H  =  2  arrays  containing  Ni  =  Ni  =  1  sen¬ 
sor  each,  we  obtain  a  generalization  of  the  classic  time  delay 
estimation  problem  [11]  with  partial  signal  coherence  at  the 
sensors.  For  arbitrary  number  of  arrays  H  and  Ni, . . . ,  Nh, 


we  can  specialize  (12)  for  sources  with  a  narrowband  power 
spectrum. 

The  CRB  based  on  (12)  provides  a  performance  bound 
on  source  location  estimation  methods  that  jointly  process 
all  the  data  from  all  the  sensors.  Such  processing  provides 
the  best  attainable  results,  but  it  also  requires  significant 
communication  bandwidth  to  transmit  data  from  the  in¬ 
dividual  arrays  to  the  fusion  center.  We  have  developed 
performance  bounds  for  schemes  that  perform  bearing  es¬ 
timation  at  the  individual  arrays  in  order  to  reduce  the 
required  communication  bandwidth  to  the  fusion  center. 
These  CRBs  facilitate  a  study  of  the  tradeoff  between  source 
location  accuracy  and  communication  bandwidth  between 
the  arrays  and  the  fusion  center.  Two  methods  are  consid¬ 
ered  [12]: 

1.  Ordinary  triangulation,  where  each  array  estimates 
the  source  bearing  and  transmits  the  bearing  esti¬ 
mate  to  the  fusion  center.  This  approach  does  not 
exploit  wavefront  coherence  between  the  distributed 
arrays,  but  it  minimizes  the  communication  band¬ 
width  between  the  arrays  and  the  fusion  center. 

2.  Each  array  estimates  the  source  bearing  and  trans¬ 
mits  the  bearing  estimate  to  the  fusion  center.  In 
addition,  the  raw  data  from  one  sensor  in  each  ar¬ 
ray  is  transmitted  to  the  fusion  center.  The  fusion 
center  then  estimates  the  propagation  time  delay  be¬ 
tween  pairs  of  distributed  arrays,  and  triangulates 
these  time  delay  estimates  with  the  bearing  estimates 
to  localize  the  source. 

Method  2  performs  nearly  as  well  as  optimum  joint  process¬ 
ing  if  the  SNR  is  high  enough. 

4.  EXAMPLES 

We  present  an  example  that  illustrates  the  potential  im¬ 
provement  in  source  localization  accuracy  when  coherence 
between  the  distributed  arrays  is  exploited.  Consider  a 
scenario  with  H  —  3  arrays,  where  the  individual  arrays 
are  identical  and  contain  N\  =  N?  =  Ns  —  7  sensors. 
Each  array  is  circular  and  has  4-ft  radius,  with  six  sen¬ 
sors  equally  spaced  around  the  perimeter  and  one  sensor 
in  the  center.  Narrowband  processing  in  a  1-Hz  band  cen¬ 
tered  at  50  Hz  is  assumed,  with  an  SNR  of  10  dB  at  each 
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5.  REFERENCES 


sensor,  i.e.,  G ,,hh(ui)/Gw(u)  =  10  for  h  =  1  and 

2tt(49.5)  <  w  <  27r(50.5)  rad/sec.  The  signal  coherence 
~1s,gh{u)  =  7s(w)  is  varied  between  0  and  1.  We  assume  that 
T  =  4000  time  samples  are  obtained  at  each  sensor  with 
sampling  rate  f,  =  2000  samples/sec.  The  source  localiza¬ 
tion  performance  is  evaluated  by  computing  the  radius  of 
coordinates  that  satisfies  the  expression 


the  ellipse  in  ( x ,  y 
[  x  y  ]  J 


=  1,  where  J  is  the  FIM.  If  the  errors 


in  (£,(/)  localization  are  jointly  Gaussian  distributed,  then 
the  ellipse  represents  the  contour  at  one  standard  devia¬ 
tion  in  root-mean-square  (RMS)  error.  The  error  ellipse  for 
any  unbiased  estimator  of  source  location  cannot  be  smaller 
than  this  ellipse  derived  from  the  FIM. 


The  H  =  3  arrays  are  located  at  coordinates  (aq,  j/i )  = 
(0,0),  (x2,!/2)  =  (400,400),  (£3,5/3)  =  (100,0),  and  one 
source  is  located  at  (i3,i/3)  =  (200,300),  where  the  units 
are  meters.  Figure  3a  shows  the  ellipse  radius  for  various 
values  of  the  signal  coherence  7„(cj).  Note  that  a  significant 
improvement  in  localization  accuracy  is  potentially  possible 
with  the  small  value  of  coherence  73(ih)  =  0.1,  and  the  CRB 
gets  smaller  as  the  coherence  increases.  Note  also  that  the 
localization  scheme  2  described  above  (bearing  plus  time- 
delay  estimation)  may  perform  as  well  as  the  optimum,  joint 
processing  scheme. 

The  CRB  results  in  Figure  3a  indicate  that  even  small 
amounts  of  signal  coherence  between  widely  distributed  ar¬ 
rays  provide  the  potential  for  significant  improvement  in 
source  localization  accuracy.  We  point  out  that  the  CRB 
results  for  time-delay  estimation  in  this  case  are  optimistic 
due  to  the  narrowband  model  for  the  observations.  With 
narrowband  signals  at  50  Hz,  the  time  delays  are  resolvable 
only  within  the  interval  of  one  period  of  (50  Hz)-1  =  0.02 
sec.  The  CRB  assumes  that  the  ambiguities  on  the  order 
of  0.02  seconds  are  resolved  by  an  unbiased  estimator.  This 
ambiguity  in  time-delay  estimation  can  be  reduced  by  ex¬ 
ploiting  the  wideband  nature  of  the  signals. 

Next  we  present  results  from  measured  aeroacoustic  data 
to  illustrate  typical  values  of  signal  coherence  at  distributed 
arrays.  The  experimental  setup  is  illustrated  in  Figure  3b, 
which  shows  the  path  of  a  moving  ground  vehicle  and  the  lo¬ 
cations  of  four  microphone  arrays  (labeled  1,  3,  4,  5).  Each 
array  is  circular  with  N  =  7  sensors,  4-ft  radius,  and  six  sen¬ 
sors  equally  spaced  around  the  perimeter  with  one  sensor 
in  the  center.  We  focus  on  the  10  second  segment  indicated 
by  the  O's  in  Figure  3b.  Figure  3c  shows  the  power  spectral 
density  (PSD)  of  the  data  measured  at  arrays  1  and  3  dur¬ 
ing  the  10  second  segment.  Note  the  dominant  harmonic  at 
40  Hz.  Figure  3d  shows  the  estimated  coherence  between 
arrays  1  and  3  during  the  10  second  segment.  The  coher¬ 
ence  is  approximately  0.85  at  40  Hz,  which  demonstrates 
the  presence  of  significant  coherence  at  widely-separated 
microphones.  Exploiting  this  coherence  has  the  potential 
for  improved  source  localization  accuracy,  as  illustrated  in 
the  CRBs  of  Figure  3a.  The  Doppler  effect  due  to  source 
motion  was  compensated  prior  to  the  coherence  estimate 
shown  in  Figure  3d. 
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a1(w)ai(w)HGs,n(w) 

ai(w)aH(w)Hexp(-jwDiH)Gs,iH(w) 

Gz(w)  = 

aH(w)ai  (w) H  exp(+jwDiH)GSliH  (w)*  •  •  ■ 

aH(w)aH(w)HGj,HH(w) 

+  Gw(u>)I 

(11) 

,  T  r\  (d Gz(«)n  , 

J»~l u.J,  lr“'\  8»,  GzM 

~ldGdfU)  GzM"1}  du,  i,  j  =  1,2 

(12) 

Figure  2:  Cross-spectral  density  matrix  Gz(u)  of  Z(f)  in  (9), 


and  FIM  J  for  parameters  ©  =  [xa,  j/s]T- 
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Figure  3:  (a)  CRBs  on  RMS  source  localization  error  for  a  scenario  with  H  =  3  arrays  and  one  source,  (b)  Path  of  ground 
vehicle  and  array  locations  for  measured  data,  (c)  Mean  power  spectral  density  (PSD)  at  arrays  1  and  3  estimated  from 
measured  data  over  the  10  second  segment  O  in  (b).  Top  panel  is  bottom  panel  is  Gs,33(/).  (d)  Mean  spectral 

coherence  7s,i3(/)  estimated  over  the  10  second  segment. 
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ABSTRACT 

In  a  number  of  array  signal  processing  applications, 
such  as  underwater  source  localization,  the  propagation 
medium  is  not  homogeneous,  which  causes  a  distortion 
of  the  wavefront  received  by  the  array.  In  this  paper, 
we  consider  the  direction-of-arrival  (DOA)  estimation 
problem  for  such  distorted  wavefronts.  In  previous  ap- 
praoches,  the  so-called  multiplicative  noise  scenario  is 
considered  based  on  the  assumption  that  the  distor¬ 
tion  is  random  and  can  be  parameterized  by  a  small 
number  of  parameters.  To  gain  robustness  against  mis- 
modelling  we  assume  a  scenario  in  which  the  wavefront 
amplitude  is  distorted  in  a  completely  arbitrary  way. 
We  derive  the  maximum  likelihood  (ML)  estimator  of 
the  DOA  and  show  it  can  be  obtained  by  means  of 
a  simple  1-D  search.  The  Cramer-Rao  bound  (CRB) 
for  the  problem  at  hand  is  derived.  Numerical  simu¬ 
lations  illustrate  a  good  performance  of  the  estimator 
and  show  that  its  accuracy  is  comparable  with  that 
of  estimators  which  require  knowledge  of  the  form  of 
amplitude  distortions. 

1.  INTRODUCTION  AND  PROBLEM 
FORMULATION 

In  a  number  of  direction-of-arrival  (DOA)  estimation 
applications,  such  as  underwater  source  localization  by 
means  of  large  hydrophone  arrays,  the  heterogeneity 
of  the  propagation  medium  causes  a  distortion  of  the 
wavefront  received  by  the  array  [1,  2].  More  exactly, 
if  we  let  yk(t)  denote  the  f-th  observed  sample  of  the 
output  of  the  fc-th  sensor  in  the  array  and  assume  that 
the  (distorted)  wavefront  impinging  on  the  array  is  nar¬ 
rowband  and  its  DOA  is  equal  to  9,  then  we  can  write, 
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for  k  =  1,  •  •  •  ,m  and  t  =  1,  •  •  •  ,  N 

Vk(t)  =  el<t>{t)xk(t)ak{d)  +  ek{t)  (1) 

where  <f>(t)  €  [—n,  7r]  is  the  unknown  time- varying  phase 
of  the  received  (baseband)  wavefront  at  the  first  sen¬ 
sor,  xk(t)  6  R  describes  a  time-and-space-varying  am¬ 
plitude  distortion,  ek(t)  is  a  noise  term,  and  ak(6)  = 
e,Tfc(0)  with  rk(d)  being  the  DOA-dependent  time  need¬ 
ed  by  the  wavefront  to  travel  from  the  first  to  the  fc-th 
sensor  (hence  ti(9)  =  0).  We  assume  that  the  noise 
{ek(t)}  is  both  spatially  (in  k)  and  temporally  (in  t) 
white  and  that  it  has  a  circular  Gaussian  distribution 
with  zero  mean  and  unknown  variance  a2.  The  prob¬ 
lem  we  consider  in  the  sequel  is  the  maximum  likelihood 
(ML)  estimation  of  9  (which  is  the  parameter  of  inter¬ 
est)  as  well  as  {0(t)}fli  and  (which  are 

the  nuisance  parameters)  from  the  observed  array  data 

{ymTJc::Nm- 

Most  previous  approaches  to  DOA  estimation  of 
distorted  wavefronts  have  considered  the  so-called  mul¬ 
tiplicative  noise  scenario  in  which  {zfc(t)}fcli  is  as¬ 
sumed  to  be  a  spatially  stationary,  temporally  white 
Gaussian  random  vector.  While  this  assumption  makes 
it  possible  to  parameterize  the  distribution  of  {xk(t)} 
by  m  parameters  only  (the  spatial  covariance  matrix 
of  {xfc(t)}”_1  is  Toeplitz  in  such  a  case),  it  is  evidently 
rather  restrictive.  In  fact,  to  reduce  the  complexity  of 
the  DOA  estimation  problem  even  further,  it  is  often 
assumed  that  the  covariance  matrix  can  be  character¬ 
ized  by  two  parameters  only.  Even  so,  the  (stochastic) 
ML  estimation  of  those  two  parameters  along  with  6 
and  a2  remains  a  complicated  task  requiring  a  compu¬ 
tationally  burdensome  4-D  search  [3].  Here,  we  take  a 
different  route,  as  suggested  by  the  different  set  of  as¬ 
sumptions  that  we  have  already  made  on  (1).  To  make 
our  DOA  estimation  approach  robust  to  mismodelling 
the  wavefront  amplitude  distortion,  we  have  modeled 
{xk(t)}  as  arbitrary  deterministic  variables.  This  is  a 
sensible  thing  to  do  when  no  a  priori  knowledge  on 
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{xk{t)}  is  available,  or  when  the  observations  contain 
a  few  samples  only. 

However,  the  (deterministic)  ML  estimation  prob¬ 
lem  corresponding  to  the  above  assumption  appears  to 
be  extremely  complicated:  besides  the  parameter  of  in¬ 
terest  9,  the  likelihood  function  depends  on  (m4-l)AT+l 
nuisance  parameters.  Despite  this  apparent  complex¬ 
ity,  we  show  in  the  paper  that  all  nuisance  parameters 
can  be  eliminated  from  the  likelihood  function,  hence 
leaving  a  1-D  search  problem  for  the  exact  (determinis¬ 
tic)  ML  DOA  estimation.  Additionally,  the  so-obtained 
DOA  estimate  is  quite  accurate  in  spite  of  the  large 
number  of  nuisance  parameters  present.  In  fact,  we 
show  that  its  variance  is  close  to  the  Cramer- Rao  bound 
(CRB),  for  which  we  derive  a  closed-form  expression. 
Moreover,  its  accuracy  compares  favorably  with  that 
of  the  approximate  (stochastic)  ML  DOA  estimator  of 
[3]  obtained  under  the  assumption  (exploited  by  [3]  but 
not  by  our  DOA  estimator)  that  the  form  of  the  covari¬ 
ance  matrix  of  x(t )  is  known.  Finally,  its  robustness  is 
evaluated.  It  is  shown  to  perform  reasonably  well  even 
in  the  presence  of  phase  fluctuations  of  the  wavefronts, 
i.e.  when  Xk(t)  is  complex- valued,  even  though  its  for¬ 
mulation  ignores  phase  fluctuations  and  hence  it  is  not 
intended  for  handling  such  a  situation. 

2.  ML  DOA  ESTIMATION 

Under  the  assumptions  made  the  negative  log-likelihood 
function  can  be  written  as  [4,  5] 

L  =  mN  log  7r  +  mN  log  a2 

1  N  m  2 

+  \  £  £ \vk(t)  -  e^(t)xfc(t)afc(0)| 

a  t=i  fc=i 

It  is  well-known  [4,  5]  that  L  can  be  minimized  explic¬ 
itly  with  respect  to  a2,  leaving  a  concentrated  negative 
log-likelihood  function  that  depends  only  on  9,  {xk(t)} 
and  {</>(£)}: 

N  m  2 

z  =  EEhw-ei0(t)ifcWafcW|  <2) 

t=i  fc=i 


The  minimization  of  the  function  above  with  respect 
to  {0(f)}  and  {xk{t)}  reduces  to  the  minimization  of 
the  inner  term  sum  in  (2)  for  each  t.  Therefore,  let 
us  consider  the  following  generic  function  (where  the 
dependence  on  t  and  6  is  temporarily  omitted  for  no- 
tational  convenience): 

m 

/  =  E  \yk  ~ e,*x*a*  I2  (3) 

k= 1 


A  straightforward  calculation  shows  that 

m  2 

/  =  E  W2  +  lXk  ~  Re  (e~t0afc2/fc)] 

fc= 1 

-  [Re(e-^4yfc)]2  (4) 

where  the  superscript  *  denotes  the  complex  conjugate 
for  scalars  and  the  conjugate  transpose  for  matrices  and 
vectors.  The  minimization  of  (4)  with  respect  to  £fc(t) 
yields: 

xfc(f)  =  Re[e-^t><(%fe(t)]  (5) 

where  the  ML  estimators  (MLE’s)  9  and  {^(t)  j  are 

yet  to  be  determined.  Insertion  of  (5)  into  (4)  shows 
that  the  MLE  of  4>(t)  is  obtained  by  maximizing  (for 
each  t)  the  function 

m 

3  =  2^[Re(e-^yfc)]2 


m 

=EKi2+Re(e'<2^‘^)] 

k=i 


m 

(  m  \ 

=  const.  + 

E 

cos 

are  (Eafc*yfc)  ~2<t> 

k=i 

\fc=l  / 

( 

where,  to  derive  the  second  equality,  we  used  the  fact 
that  for  any  complex  number  a, 

[Re  (a)]2  =  \  (a  +  a*)2  =  \  [|a|2  +  Re  (a2)] 

It  follows  that 

4>(t)  =  \  arg  a2*  (<?)  (7) 


with  the  MLE  of  9  given  by 


N 


9  =  arg  max  ^ 


t= l 


E°fc*  ^ 


fe= l 


(8) 


Hence,  the  deterministic  ML  DOA  estimator  simply 
entails  a  1-D  search.  For  uniform  linear  arrays  (ULA), 
we  have  a,k{0)  =  exp{i(fc  -  l)u;}  where  u  =  27rAsin0 
is  the  so-called  spatial  frequency  and  A  is  the  inter¬ 
element  spacing  in  wavelengths.  In  such  a  case,  the 
inner  sum  in  (8)  can  be  evaluated  (as  a  function  of  9) 
by  using  a  FFT  algorithm  with  zero  padding  applied 
to  the  squared  data  samples  {y2(t)}fc=1  (for  each  t). 
The  ability  to  do  so  is  particularly  valuable,  from  a 
computational  standpoint,  for  large  arrays  (m  >>  1). 
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Remark  1  Note  that  if  we  re-define  the  data  sam¬ 
ples  as  zk(t)  =  yk{t),  and  the  steering  vector  elements 
as  ak(9)  =  a2k{9)  then  6  =  argmineXItli  |a*(5)z(f)|. 
The  function  in  (8)  can  thus  be  interpreted  as  an  L\- 
beamformer,  except  for  the  squaring  operation  applied 
in  (8)  to  the  observed  data  and  the  elements  of  the 
steering  vector  prior  to  beamforming. 


where  (t,r  =  l,---  ,N) 

F4,(t,r)  =  \\x(t)\\26(t,r)  (15) 

N 

F*  =  5>r(t)T2x(t)  (16) 

t=i 

FVe(r)  =  xr(r)Tx(r)  (17) 


3.  CRAMER-RAO  BOUNDS 

In  this  section,  we  derive  the  Cramer-Rao  Bounds  for 
the  problem  at  hand.  For  the  sake  of  clarity,  let  us 
introduce  the  following  notations 


and  T  =  diag (t((0),t'(0),  •  •  ■  ,7^(0)).  The  block  cor¬ 
responding  to  [<pT  0]T  is 

pA  1  F $ 

a 2  F%  Fe 


o(0)  =  Uw,eiT8(9),..  -  ,eiT"W' 

0  =  [0(l),...,0(A)]r 
»(0  =  [sci(0*  *  •  •  »®m(i)]r 
X  =  [xT(l),...  ,XT(N)]T 


1  T 


It  is  well-known  (4,  5]  that  under  the  assumptions  made 
the  CRB  for  the  noise  variance  is  decoupled  from  the 
CRB  for  the  other  parameters.  In  the  following,  we 
concentrate  on  the  CRB  for  r\  =  [xT  <f>T  0]T.  The 
Fisher  Information  Matrix  (FIM)  is  given  by  [4,  5] 


with 


F(M)  =  ~Re 


y dn(t) 

dVk  drji 


(9) 


M(t)=f{y(t)}=e^^a(0)x(t)  (10) 


and  where  4»a(0)  =  diag(a(0)).  In  order  to  derive  a 
closed-form  expression  for  the  FIM,  first  note  that 

^re^a(0)ekS(t,S)  (11) 

^  =  i6^*)*o(0)x(Otf(t,*)  (12) 

^-^)4i(,)x(t)  (13) 


where  $d(0)  =  diag(d(0))  and  d(9)  =  da(9)/d9.  6(t,s) 
is  the  Kronecker  delta,  and  ek  is  the  m-dimensional 
vector  with  all  elements  equal  to  zero,  except  the  k-th 
element  which  equals  one.  Using  the  previous  results, 
it  can  be  shown  (see  [6]  for  details)  that  the  FIM  has 
the  following  block  form 


F 


2 


I  mN 

0 

0 


0 

F<f,g 

Fe 


(14) 


The  CRB  for  9  is  obtained  as  the  lower-right  corner 
element  of  F  .  By  a  formula  for  the  inverse  of  parti¬ 
tioned  matrices  [4,  5],  it  can  be  readily  shown  that  the 
CRB  for  9  can  be  written  as  follows 


CRB{9)  = 


2  Fe-FleF^F^ 


xT{t)T2x(t ) 


[xr(prx(t)]2 

xT(t)x(t) 


4.  NUMERICAL  EXAMPLES  AND 
CONCLUSIONS 


In  this  section,  we  illustrate  the  performance  of  our 
deterministic  MLE  and  compare  it  with  the  COMET 
estimator  [3],  which  is  a  large-sample  realization  of  the 
stochastic  ML  estimator.  A  comparison  with  the  L\- 
beamformer 


01 


N 

=  arg  max  ^ 

8  t= l 


k- 1 


and  the  conventional  Z/2-beamformer 


N 

91  =  arg  max  ^ 
9  t= l 


v*(o 


*;= i 


is  also  presented.  We  consider  a  ULA  of  m  =  16  sensors 
spaced  a  half  wavelength  apart.  The  DOA  of  the  source 
is  set  to  9  =  10°.  x(t)  is  a  real-valued,  zero- mean  tem¬ 
porally  white  Gaussian  random  process  with  covariance 
matrix  R  =  £  {x(t)xT(t)}.  To  make  COMET  appli¬ 
cable,  we  assume  that  the  elements  of  R  are  given  by 
R(k,()  =  p\k~l I  with  p  =  0.9  which  corresponds  to  a 
10  log10  p2  =  —0.915  dB  coherence  loss  at  one  wave¬ 
length  separation  [2].  The  signal  to  noise  ratio  (SNR) 
is  defined  as  - 10  log10(cT2).  300  Monte-Carlo  simula¬ 
tions  were  run  to  estimate  the  root  mean-square  error 
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(RMSE)  of  the  estimates,  with  all  values  given  in  de¬ 
grees  (°).  For  each  of  the  300  simulations,  a  different 
realization  of  {®(t)}£li  was  used  and  the  corresponding 
(deterministic)  CRB  was  computed  from  (18).  The  so- 
obtained  values  were  then  averaged  over  the  300  trials 
to  yield  what  we  refer  to  as  the  average  deterministic 
CRB.  Finally,  we  also  display  the  stochastic  CRB  de¬ 
rived  under  the  assumption  that  x(t)  is  a  white  Gaus¬ 
sian  random  process  with  a  covariance  matrix  R  pa¬ 
rameterized  by  p. 

Fig.  1  displays  the  RMSE  of  the  estimates  versus 
the  number  of  snapshots  for  SNR  =  OdB.  It  can  be  ob¬ 
served  that  the  deterministic  MLE  has  a  performance 
close  to  the  deterministic  CRB.  In  all  cases,  but  espe¬ 
cially  in  low  samples,  the  deterministic  ML  estimator 
outperforms  the  COMET  estimator  in  spite  of  the  fact 
that  the  former  is  computationally  simpler  and  does 
not  require  as  many  assumptions  as  COMET  does.  The 
deterministic  MLE  performs  better  than  the  CBF ,  es¬ 
pecially  in  small  samples,  and  always  outperforms  the 
Li-beamformer  significantly. 


Figure  1:  CRB’s  and  RMSE  of  the  estimators  N.  Am¬ 
plitude  distortions.  SNR  =  0  dB. 

Next,  the  influence  of  the  number  of  sensors  on  the 
estimation  performance  is  examined  in  Fig.  2  where 
m  is  varied  from  8  to  40  while  the  number  of  snap¬ 
shots  is  fixed  at  N  =  60  and  SNR  =  0  dB.  Fig.  2 
reveals  that  the  empirical  RMSE  of  the  deterministic 
ML  estimator  remains  nearly  constant  while  that  of 
the  other  estimators,  particularly  COMET,  increases 
when  m  increases.  Finally,  we  study  the  influence  of  p 
on  the  performance  of  the  estimators.  The  coherence 
loss  at  a  wavelength  separation  is  varied  from  —3  to 
-0.25  dB  while  N  =  60  and  SNR  =  0  dB.  The  re¬ 


sults  are  plotted  in  Fig.  3.  It  can  be  noticed  that  the 
performance  of  the  deterministic  MLE  remains  nearly 
constant  for  coherence  losses  less  than  -1.5  dB,  and 
tends  to  increase  under  this  value.  We  have  observed 
that  this  threshold  depends  on  m  and  decreases  when 
m  decreases,  which  seems  logical.  On  the  other  hand, 
all  other  estimators  have  a  performance  that  degrades 
continuously  with  the  coherence  loss.  Finally,  we  note 
that  the  CBF  is  as  accurate  as  the  deterministic  ML 
for  small  coherence  loss  values. 


Figure  2:  CRB’s  and  RMSE  of  the  estimators  versus 
the  number  of  sensors.  Amplitude  distortions.  N  —  60 
and  SNR  =  0  dB. 

In  a  second  series  of  simulations,  we  test  the  robust¬ 
ness  of  our  estimator.  We  consider  the  more  compli¬ 
cated  situation  where  the  distortions  affect  not  only  the 
wavefront  amplitude  but  the  phase  as  well.  To  this  end, 
phase  fluctuations  are  introduced  in  the  model.  More 
exactly,  x(t)  is  modeled  as  Xk(t)  =  Xfc(t)e*^'!^,  where 
x{t)  is  a  real-valued  process  with  the  same  covariance 
matrix  R  as  before  and  ip{t)  is  a  zero-mean  sensor- 
to-sensor  independent-increment  process,  i.e.  ^*(<)  = 
^fc-i(f)  +  Ai/>k(f)  where  A^k  are  independent  random 
variables  uniformly  distributed  on  [— 7n5, 7r<5].  The  vari¬ 
ance  of  the  phase  distortions  is  £  {4>l}  =  {k-l)52n2/d 
and  hence  increases  with  the  sensor  index.  Note  that 
the  previous  set  of  simulations  considered  the  case  of 
d  =  0  and  also  observe  that  the  deterministic  MLE  is 
not  intended  for  handling  the  case  of  <5  ^  0.  In  con¬ 
trast,  the  COMET  estimator  can  cope  with  this  prob¬ 
lem  since  it  only  relies  on  the  form  of  the  covariance  ma¬ 
trix  of  the  observations,  which  is  unchanged.  In  other 
words,  the  deterministic  MLE  assumes  no  phase  distor¬ 
tions  and  hence  uses  a  wrong  model  whereas  COMET 
utilizes  a  correct  model.  This  example  is  chosen  to  test 
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Value  of  6 

Figure  3:  CRB’s  and  RMSE  of  the  estimators  versus 

the  coherence  loss.  Amplitude  distortions.  N  =  60  and  * 1  jUre  4:  °‘  t^le  estimators  versus  6.  Amplitude 

SNR  =  0  dB.  an<J  phase  distortions.  N  =  60  and  SNR  =  0  dB. 


the  robustness  of  our  estimator  and  check  whether  it 
can  be  applied  in  more  difficult  scenarios.  Evidently, 
the  CRB  formula  (18)  is  no  longer  relevant.  Moreover, 
the  stochastic  CRB  which  has  been  derived  under  the 
Gaussian  assumption  can  no  longer  be  used.  Fig.  4 
displays  the  RMSE  of  the  estimates  when  S  is  varied. 
We  consider  a  scenario  with  a  relatively  small  number 
of  snapshots  AT  =  60  and  a  low  SNR  =  0  dB.  It  can 
be  noted  that  the  performance  of  the  deterministic  ML 
estimator  as  well  as  that  of  the  other  estimators  contin¬ 
uously  degrades  as  S  increases.  For  6  above  a  thresh¬ 
old,  COMET  and  the  CBF  have  a  smaller  RMSE,  the 
former  having  the  smallest  RMSE.  However,  the  de¬ 
terministic  MLE  proposed  herein  turns  out  be  quite 
robust  and  accurate  for  a  wide  range  of  phase  fluctua¬ 
tions,  which  is  an  additional  interesting  feature  of  it. 

The  deterministic  MLE  proposed  in  this  paper  does 
not  make  any  modeling  assumption  on  the  amplitude 
distortion,  which  is  a  significant  advantage.  On  the 
other  hand,  it  assumes  that  there  is  no  phase  distor¬ 
tion  (so  that  x(t)  is  real- valued).  We  showed  that  it  is 
quite  robust  to  the  violation  of  this  latter  assumption, 
yet  its  performance  does  degrade  as  the  phase  distor¬ 
tion  increases.  The  trade-off  between  the  stochastic 
MLE  (for  which  COMET  is  an  approximate  (large- 
sample)  implementation)  and  the  deterministic  MLE 
of  this  paper  is  hence  quite  clear:  depending  on  the 
a  priori  information  available  on  the  wavefront  distor¬ 
tion,  one  approach  may  be  preferred  to  another,  with 
the  deterministic  MLE  having  more  chances  to  be  cho¬ 
sen  in  scenarios  with  little  or  no  a  priori  information. 
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ABSTRACT 

The  use  of  sensor  arrays  in  signal  processing  applications  has 
received  considerable  attention.  Various  array  perturbations  caused 
by  phase,  calibration,  or  modeling  errors  often  cause  the  sensor  ob¬ 
servations  to  become  only  partially  correlated,  limiting  the  appli¬ 
cation  of  traditional  matched-field  beamformers.  Quadratic  array 
processing  is  optimal  for  many  randomly  perturbed  array  prob¬ 
lems;  however,  direct  implementation  poses  a  significant  compu¬ 
tational  burden.  We  propose  a  highly  efficient,  asymptotically  op¬ 
timal  method  of  implementing  quadratic  array  processors  suitable 
for  detection  problems  in  randomly  perturbed  arrays.  Specifically, 
we  show  that  under  certain  conditions  the  optimal  array  processor 
can  be  approximately  realized  efficiently  and  robustly  employing 
only  discrete  Fourier  transforms  to  deal  with  spatial  processing 
while  entailing  only  a  small  loss  in  performance. 

1.  INTRODUCTION 

The  detection  of  signals  in  noise  is  a  classical  hypothesis  testing 
problem.  The  use  of  a  sensor  array  can  considerably  enhance  sig¬ 
nal  detection  by  providing  a  large  gain  in  the  SNR  and  allowing  for 
target  or  signal  source  localization.  Due  to  the  need  for  fast  pro¬ 
cessing  of  target  data  in  radar/sonar  applications,  the  need  exists 
for  very  efficient  array  processing  structures.  Unfortunately,  per¬ 
turbations  in  the  array  or  imperfect  spatial  coherence  of  the  signal 
wavefronts  due  to  complicated  propagation  may  lead  to  complex 
receiver  structures  for  optimal  performance.  The  assumptions  of  a 
known  array  response  is  rarely  satisfied  in  practice.  Due  to  changes 
in  the  weather,  the  surrounding  environment,  and  antenna  loca¬ 
tion,  the  response  of  the  array  may  be  significantly  different  than 
when  it  was  last  calibrated  [1].  If  the  perturbations  in  the  array  are 
deterministic  and  known,  then  it  is  easy  to  compensate  for  them 
and  traditional  matched-field  beamforming  detector  structures  (in 
which  the  observations  are  aligned,  summed  together,  and  cor¬ 
related  against  the  signal  of  interest)  can  still  be  used  [2,  3,  4]. 
However,  array  perturbations  are  most  often  unknown  and  must  be 
modeled  as  random,  leading  to  optimal  detection  structures  which 
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are  quadratic  in  the  observations;  such  detectors  require  compli¬ 
cated  matrix  combining  in  space.  While  it  is  relatively  straight¬ 
forward  to  derive  the  form  of  the  optimal  quadratic  processor,  its 
implementation  is  computationally  too  expensive  for  many  real¬ 
time  applications.  For  this  reason,  most  literature  regarding  ran¬ 
dom  perturbations  in  arrays  has  focused  only  on  the  effects  of  such 
perturbations  on  the  performance  of  traditional  processors  which 
ignore  the  perturbations;  the  effect  of  phase  errors  is  studied  in  [5], 
the  effect  of  spatial  calibration  errors  on  detection  performance  is 
studied  in  [6],  the  effect  of  model  errors  on  the  performance  of  the 
MUSIC  algorithm  is  studied  in  [1 , 7],  and  loss  of  signal  coherence 
as  it  propagates  across  the  array  is  studied  in  [2].  The  results  of 
these  analyses  demonstrate  that  random  perturbations  can  cause 
significant  degradation  in  performance  of  traditional  processors. 

In  this  paper  we  propose  a  highly  efficient  technique  of  imple¬ 
menting  an  asymptotically  optimal  detector  for  dealing  with  array 
perturbations.  We  develop  a  novel  method  of  employing  the  pop¬ 
ular  Fourier  transform  algorithm  to  deal  with  the  quadratic  nature 
of  the  detection  problem;  specifically,  we  show  that  under  certain 
conditions  the  discrete  Fourier  transform  (DFT)  is  asymptotically 
optimal  for  spatial  processing.  Furthermore,  we  show  that  conven¬ 
tional  frequency  domain  techniques  for  angle  of  arrival  searches 
can  easily  be  incorporated  into  our  framework.  In  the  end,  we 
show  that  our  proposed  processor  provides  a  significant  improve¬ 
ment  in  performance  over  existing,  traditional  processors  while 
achieving  a  comparable  cost  of  implementation. 

2.  MATHEMATICAL  PRELIMINARIES 

Let  us  assume  that  we  are  dealing  with  an  M-sensor  uniform  lin¬ 
ear  array  with  spacing  d,  and  that  a  single  iV-sample  signal  s  = 
[s(l), . . .  ,s(N)}t  comes  in  to  the  array  at  angle  9,  where  9  is 
usually  unknown.  We  will  denote  the  iV-sample  observation  at 
the  ith  sensor  with  the  column  vector  x;  =  [x,(l), . . . ,  Xi(N)}’\ 
i  =  1 ,M.  Each  sensor  observation  will  have  a  component 
due  to  the  signal  s  and  an  additive  noise  component  which  we 
denote  nj.  Let  XT  =  [xf , . . . ,  xI,]T  denote  the  sensor  observa¬ 
tions  concatenated  into  an  MN  x  1  column  vector,  and  let  N  be 
similarly  defined.  We  assume  the  noise  is  white,  Gaussian,  uncor¬ 
related  between  sensors,  and  independent  of  the  signal  source;  we 
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write  its  covariance  matrix  as  £[NNff]  =  <r2I.  As  in  previous 
work  dealing  with  array  perturbations,  we  use  narrowband  array 
assumptions  [4,  6],  A  problem  is  classified  as  narrowband  if  the 
signal  bandwidth  is  small  compared  to  the  inverse  of  the  transit 
time  of  the  wavefront  across  the  array.  This  allows  us  to  approx¬ 
imate  the  time  delay  that  the  signal  encounters  as  it  propagates 
between  sensors  as  phase  shifts;  of  course  this  would  hold  exactly 
if  the  signal  source  were  sinusoidal  over  the  block  of  samples  col¬ 
lected.  We  may  write  the  observation  in  Kronecker  form  as 

X  =  a(0)®s  +  N  (1) 

where  a(d)  =  [ai(6>),O2(0), ...  ,cim(0)}t  is  the  length- M  com¬ 
plex  array  response  vector.  The  vector  a(6)  contains  all  informa¬ 
tion  as  to  how  the  signal  component  is  related  between  sensors. 
Often  a.(9)  is  taken  to  be  a  deterministic  quantity;  in  a  uniform 
linear  array  in  which  the  signal  has  carrier  frequency  f0  and  the 
speed  of  propagation  is  c,  we  have  [4] 

—  [1  e~i27rfo  dsin(6)/c  e-j27r(M-l)/odain(0)/cjT 

(2) 

which  represents  a  pure  phase  delay  between  each  element.  We 
will  see  later  that  due  to  the  similarity  with  the  DFT  basis  func¬ 
tions,  such  an  array  response  vector  allows  for  computationally 
efficient  FFT  algorithms  to  search  over  the  unknown  angle  of  ar¬ 
rival  6.  However,  as  discussed  in  the  introduction,  there  are  often 
random  perturbations  in  the  array.  A  very  popular  method  of  mod¬ 
eling  these  random  perturbations  is  to  assume  the  array  response 
a (8)  is  a  random  quantity.  Here  it  is  common  to  assume  that  a(6) 
has  a  mean  (nominal,  calibrated)  value  am(0),  such  as  in  (2),  plus 
a  zero-mean  random  component  a (9)-. 

a{8)  =  am(0)+a(0).  (3) 

The  random  term  a (6)  will  have  the  form  A(0)[ai, . . . ,  5,m}t 
where  A (6)  is  a  diagonal  matrix  containing  the  elements  of  am  ( 0 ) 
on  the  diagonal  and  the  di’s  are  complex  random  quantities  that 
represent  gain  and  phase  errors,  "typically  the  vector  [fii , . . . ,  o,m}t 
is  assumed  to  have  a  Gaussian  distribution  with  known  covariance 
Ra  [1,  6,  7,  8].  A  diagonal  Ra  would  imply  the  array  errors  are 
uncorrelated;  off-diagonal  terms  would  indicate  sensor-to-sensor 
correlations  that  result  if  some  sensors,  such  as  adjacent  elements, 
tend  to  perturb  uniformly.  The  covariance  of  a(6>)  is  given  by 
R  =  A(0)RaAH(0). 

For  clarity  of  presentation,  we  will  pose  our  detection  problem 
in  the  radar  signal  detection  setting.  We  assume  a  known,  deter¬ 
ministic  signal  s  is  transmitted.  If  a  target  is  present,  the  reflected 
signal  is  assumed  to  be  6s,  where  6  is  a  deterministic  but  unknown 
complex  phase  factor.  For  simplicity  we  will  assume  that  |6|  =  1 
so  that  the  SNR  of  the  reflected  signal  is  known.  When  the  signal 
is  present,  the  observation  is  given  by 

X  =  (a m(8)  +  a(0))  ®  6s  +  N,  (4) 

where  a (6)  ~  A/"(0,  R).  We  write  the  net  signal  component  as  two 
terms  S  =  Sm(6)  +  S  where  Sm(6)  =  a m(9)  ®  6s  is  the  signal 
mean  and  S  =  a (9)  ®  6s  is  a  zero-mean  Gaussian  component  with 
covariance  Rj.  We  assume  that  the  noise  is  Gaussian  and  white  in 
both  time  and  space  with  variance  cr2, 

3.  OPTIMAL  QUADRATIC  ARRAY  PROCESSOR 

In  this  section  we  derive  the  form  of  the  optimal  processor  and 
discuss  its  implementation.  We  will  find  that  the  optimal  structure 


is  quadratic  in  the  observations  and  that  implementation  requires 
decorrelating  the  observations  in  space.  In  statistical  hypothesis 
testing,  for  an  observation,  X,  a  real-valued  test  statistic,  L(X), 
is  compared  to  a  threshold  to  decide  in  favor  of  H0,  only  noise  is 
present,  or  Hi,  the  signal,  S,  is  present.  The  optimal  test  statistic 
based  on  the  likelihood  ratio  is  given  by  [9]; 


L(X)  -  5 108  JgSni +  5X”  ^  ~  (Ri  +  x 

+Re  {XH(Rj  +  a2I)-1Sm(6)}  +  is"(6)(R*  +  a2I)-1Sm(6). 


It  is  common  to  use  a  generalized  likelihood  ratio  test  (GLRT)  to 
deal  with  the  unknown  parameter  6,  which  involves  maximizing 
L(X)  with  respect  to  6  and  using  that  value  of  6  which  attains 
the  maximum.  The  second  and  third  terms  in  the  expression  for 
the  likelihood  ratio  are  quadratic  and  linear  in  the  observations,  re¬ 
spectively,  while  the  first  and  fourth  terms  are  just  constants.  It  can 
be  shown  that  after  maximization  over  6,  the  optimal  test  statistic, 
retaining  only  those  terms  which  depend  on  the  observation,  may 
be  written  as 


L(X)  =  ^XhRs(Rs-  +  +  |8"(Ri  +  cr2I)-1X| 

=  Q(X)+T(X), 


where  Q(X)  denotes  the  quadratic  term,  T(X)  denotes  the  lin¬ 
ear  term,  and  Sm  =  a (9)  ®  s.  We  first  discuss  implementa¬ 
tion  of  the  quadratic  term,  which  will  involve  an  eigendecompo- 
sition.  Because  S  =  a (6)  ®  6s  and  |6|  =  1,  its  covariance  Rj 
may  be  expressed  as  Rj  =  A(0)R5A H(0)  ®  ssH.  We  may 
express  the  covariance  matrix  in  eigenform  as  Rj  =  UAUH. 
The  eigenvector  matrix  U  may  be  expressed  with  a  Kronecker 
product  as  U  =  <g>  A(0)Ua,  where  we  assume  Us,  the 

eigenvector  matrix  for  Ra,  is  of  full  dimension  M  x  M.  Let 
Z  =  UHX  =  U£A"(0)(^i  ®  X)  =  [xi, . .  .,zm]t  represent 
the  spatially  decorrelated,  aligned  matched  filter  samples  (decor¬ 
relate  via  Uf  and  align  via  AH(0)).  It  can  be  shown  that  the 
quadratic  term  may  be  implemented  as 


1  - 


2Afc 


2<r2  f-  ||s||’Afc  +  cr2 


M2 


(5) 


where  the  Ait’s  are  the  eigenvalues  of  Ra.  This  reveals  that  the 
quadratic  term  in  the  detector  requires  matched-filtering  each  sen¬ 
sor  observation  against  the  signal  source  in  time,  aligning  the  sam¬ 
ples, spatially  processing  via  the  matrix  to  obtain  the  decorre¬ 
lated  samples  Zk,k  —  1  and  then  combining  terms  in  the 

proper  fashion.  Hence  decoupled  spatial  and  temporal  processing 
is  optimal. 

We  would  like  to  point  out  that  the  alignment  via  AH  (9)  may 
be  carried  out  efficiently  via  Fourier  transforms  as  in  traditional 
narrowband  array  processing  with  no  perturbations.  If  we  let  Y  = 

®X  =  [j/i , . . . ,  dm\t  denote  the  matched  filter  samples,  then 
Ah{9)Y 

=  [j/i,2/2e;'27r'V-.,2Mre727r(M~1)'*’]T,  where  we  have  let  4>  = 
—  ti^sin(6)  represent  the  mapping  from  angle  to  spatial  frequency. 
Therefore,  when  processing  with  Uf  =  {u tk },  the  /th  element 
of  the  M  x  1  column  vector  resulting  from  the  matrix  product 
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Z  =  Uf  AH(0)Y  is  given  by 

M 

zi  =  Y,uikVke~j2^lH-  (6) 

t:=i 

Letting  yi(k)  =  MuikVk,  we  may  write  the  aligned,  decorrelated 
matched  filter  samples  as 

Z=[Y1{<t>),...,YMmT  (7) 

where  Yi{4>)  denotes  the  DTFT  of  the  weighted  observation  snap¬ 
shot  {yi(fc)}fcli.  Hence  each  decorrelated  aligned  matched  filter 
sample  can  be  obtained  as  a  function  of  the  angle  of  arrival  through 
a  DTFT,  allowing  the  use  of  computationally  efficient  FFT-based 
algorithms  when  we  must  search  over  an  unknown  9  as  in  a  GLRT. 

We  must  be  careful  in  how  we  choose  to  implement  the  linear 
term  T(X)  to  insure  that  FFT-based  algorithms  can  still  be  used 
to  search  over  the  unknown  angle  of  arrival  9.  Using  techniques 
similar  to  those  used  to  write  the  quadratic  term,  the  linear  term 
may  be  written  as 


the  covariance  matrix  Ra.  A  good  measure  of  the  residual  correla¬ 
tion  still  remaining  is  provided  by  the  norm  of  the  matrix  contain¬ 
ing  the  off-diagonal  covariance  elements  of  the  transformed  coef¬ 
ficients  [11].  Defining  Dm  as  a  diagonal  matrix  containing  the 
same  diagonal  elements  as  the  matrix  FmRsF]^1,  a  measure  of 
residual  correlation  is  given  by  the  weak  (Hilbert-Schmidt)  norm 
of  the  difference  matrix  Dm  —  FmRoFm*  defined  by 

Txr  =  |Dm  —  FMRaFM!  |2 

M  M 

=  i/  E  E([Dm  _  FmR5Fv/]u)S.  (10) 

i=  1  j= 1 

As  for  a  DFT-based  detector  implementation,  we  can  no  longer 
use  the  eigenvalues  A  *  of  the  matrix  Ra  as  we  did  in  (5)  and 
(8).  We  now  have  an  approximate  diagonal  representation  of  Ra 
given  by  Ra  ~  J2k=i  Mfcffcf k  where  fk,  k  =  1, . . . ,  M  are  the 
columns  of  .  Note  that  i-ik  may  be  obtained  simply  as  the  DFT 
of  the  first  row  of  Ra.  Therefore  the  DFT  approximation  to  the 
quadratic  term  in  (5)  is  given  by 


T(X)  =  |a£(0)p| ,  (8) 

where  the  M  x  1  vector  p  may  be  obtained  as 


M 

QdfH Z)  =  ^2 


Hk 


fc=l 


|s||2/Ufc  +  ai 


(11) 


P  =  uar(^|  ®u?)x.  (9) 

HereT  =  diag(||s||, ^ , . 8||^M+ffi  )•  Recalling  the  form 

of  am(0)  for  a  uniform  linear  array  in  (2),  and  again  letting  <j>  = 
~^sin{9),  we  see  that  T(X)  may  be  expressed  as  T(X)  = 
M  \P(<P)\,  where  P(4>)  is  the  DTFT  of  the  M  x  1  vector  p. 

4.  SPATIAL  DECORRELATION  VIA  THE  DFT 

The  previous  section  revealed  that  both  the  linear  and  quadratic 
terms  require  processing  via  the  M  x  M  matrix  U" .  Unfortu¬ 
nately,  spatially  decorrelating  the  matched  filter  samples  via  U,f 
is  often  computationally  too  intensive,  especially  for  large  arrays. 
In  this  section  we  propose  a  highly  efficient  method  of  spatially 
processing  the  observations.  Note  that  if  the  distances  between 
sensors  are  equal,  as  in  a  uniform  linear  array,  then  often  Ra  is 
taken  to  be  Toeplitz  [10],  Based  on  this  property,  spatial  decor¬ 
relation  can  be  achieved  asymptotically  using  the  discrete  Fourier 
transform1  (DFT)  [11];  that  is,  we  may  substitute  the  M  x  M 
DFT  matrix  FM  for  Uf.  The  basic  idea  is  that  as  we  increase  the 
number  of  sensors,  the  matrix  Ra  grows  in  dimension  and  asymp¬ 
totically  becomes  a  circulant  matrix,  which  is  diagonalized  by  the 
DFT  matrix  Fm.  Therefore,  our  approach  will  be  to  asymptot¬ 
ically  approximate  the  optimal  quadratic  detector  derived  in  the 
previous  section  by  substituting  the  DFT  matrix  Fm  in  place  of 
the  true  spatial  decorrelating  matrix  Uf.  This  results  in  a  de¬ 
tector  employing  only  DFT’s  for  spatial  processing  and  simple 
matched  filters  in  time.  Since  the  cost  of  implementing  a  DFT  is 
relatively  small  when  FFT  techniques  are  used,  the  computational 
expense  of  such  a  processor  is  comparable  to  that  of  existing  tradi¬ 
tional  matched-field  beamformers.  Although  the  DFT  approach  is 
asymptotically  optimal,  we  are  also  interested  in  how  the  decorre¬ 
lating  power  of  the  DFT  depends  on  the  number  of  sensors  M  and 

1Note  that  the  use  of  the  DFT  here  is  only  for  spatial  decorrelation  and 
is  not  related  to  angle-of-arrival  searches. 


where  Z  =  [zu..  .,zm}t  =  FmAh(0)(sh  <g>  X)  are  the  ap¬ 
proximately  spatially  decorrelated,  aligned  matched  filter  samples. 
The  DFT  approximation  to  the  linear  part  of  the  detector  in  (8)  is 
obtained  by  making  the  same  type  of  substitutions.  Specifically, 

7W(X)  =  |a£(0)p| ,  (12) 

where  p  =  F^fFM(sH  ®  X), 

f  =  diag( . .  and  a8ain  the  W’8  are 

the  DFT  coefficients  of  the  first  row  of  Ra. 

The  effect  of  the  DFT  substitution  as  an  approximate  decor¬ 
relator  on  the  angle  of  arrival  search  is  as  follows.  The  approx¬ 
imately  decorrelated,  aligned  matched  filter  samples  are  given  by 
Z  =  Fm  Ah  (9)  Y.  Analagous  to  (6),  the  Zth  element  of  this  M  x  1 
snapshot  may  be  expressed  as 


£vl  .  ^ 

Z;  =  =  F(0+  -77-)) 


M 


M 


k=  1 


(13) 


where  Y(<f>)  is  the  DTFT  of  the  matched  filter  samples  {yk}k=i- 
That  is,  Z  =  [Y(4>),Y(<j>  +  y(<£+  ^)f.  There¬ 

fore,  the  approximately  decorrelated,  aligned  matched  filter  sam¬ 
ples  may  be  obtained  efficiently  as  the  DTFT  of  the  matched  filter 
samples  with  appropriate  frequency  shifts.  In  practice  we  would 
sample  4>  =  jfc,  p  =  0, . . . ,  M  -  1  to  obtain  a  DFT,  resulting  in 


Z;  =  [Y(p),Y(p+l),...,Y(p+  M -1)}T,  where  Y  (p)  now 
denotes  the  pth  DFT  sample  of  the  matched  filter  samples.  This  re¬ 
veals  that  the  search  over  angle  (now  represented  through  the  dis¬ 
crete  variable  p)  translates  into  simply  circularly  shifting  the  DFT 
coefficients  of  the  matched  filter  samples.  Note  that  multiplica¬ 
tion  of  the  magnitude-squared  circularly  shifted  DFT  coefficients 
with  the  weights  is  simply  the  circular  convolution  of 

these  two  vectors.  Hence  fast  convolution  algorithms  can  be  used 
to  form  the  circular  convolution,  and  choosing  the  maximum  value 
of  this  convolution  will  result  in  the  GLRT  which  incorporates  the 
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unknown  angle  of  arrival.  As  for  dealing  with  the  angle-of-arrival 
search  with  the  linear  term,  we  note  that  the  inner  product  aH(d)p 
represents  a  particular  DFT  sample  of  the  vector  p.  Therefore, 
from  (12)  a  GLRT  would  require  us  to  choose  the  largest  ele¬ 
ment  of  the  vector  |FA/p|  =  |fFM(sH®X)|  =  |fz|,  where 

Z  =  Fm(sh  ®  X)  =  {zk}  denotes  the  DFT  of  the  matched-filter 
samples.  The  DFT-based  implementation  of  the  linear-quadratic 
detector  is  illustrated  in  Figure  1. 

It  is  of  interest  to  examine  the  form  of  the  GLRT  for  the  con¬ 
ventional  matched-field  beamformer  which  does  not  account  for 
perturbations.  In  this  case,  the  optimal  test  statistic  is  given  by  [6] 

Lmf(X)  =  |(a™(0)  ®  sH)X|  .  (14) 

Again  letting  Y  =  sH  ®  X  denote  the  matched  filter  samples  and 
letting  <f>  =  sin(6 ),  we  have  LMf  =  |Y(<£)|  where  Y((j>) 
is  the  DTFT  of  the  matched  filter  samples.  Hence  a  GLRT  would 
simply  require  choosing  the  maximum  value  of  magnitude  of  the 
DTFT  coefficients.  We  may  sample  to  obtain  a  DFT,  and  hence  the 
GLRT  may  be  realized  by  simply  choosing  the  maximum  DFT  co¬ 
efficient  magnitude  of  the  matched  filter  samples.  The  detector  is 
illustrated  in  Figure  2.  Comparing  with  Figure  1,  note  the  only  ad¬ 
ditional  processing  to  deal  with  array  miscalibration  over  conven¬ 
tion  matched-field  processing  which  ignores  array  miscalibration 
is  the  spatial  smoothing  provided  by  the  circular  convolution. 


Figure  1 :  Implementation  of  the  linear-quadratic  detector  via  the 
DFT  approximation;  Here  gk  =  ,  k  =  1, . . . ,  M,  and 

the  circular  convolution  block  denotes  the  circular  convolution  of 
the  input  with  the  sequence  { ^  } 


5.  SIMULATION 

For  simulation,  we  assume  M  =  64  sensors  in  a  ULA  with  half¬ 
wavelength  spacing  with  the  SNR  of  each  sensor  observation  set 
to  —40 dB.  We  generated  a  radar  waveform  return  using  Matlab 
code  modified  from  the  Mountaintop  Matlab  toolbox2.  The  radar 
waveform  consists  of  a  burst  of  Np  =  16  identical  pulses  with  a 
pulse  bandwidth  of  500  kHz,  pulse  width  of  100  /us,  PRI  of  1 .6  ms, 
and  transmit  frequency  of  435  MHz.  Reception  was  such  that  N  = 
6448  samples  were  collected  at  each  sensor.  Array  miscalibration 
was  accounted  for  by  assuming  a  symmetric  Toeplitz  covariance 
matrix  for  Ra  with  first  row  equal  to  [6]: 

_ crl[l  aa2  ■■■  aM~ ,  (15) 

2 This  toolbox  is  available  at  ftp://ftp.ee.gatech.edu/pub/users/yaron/ 


Figure  2:  Implementation  of  the  conventional  matched-field  pro¬ 
cessor  incorporating  a  GLRT  for  angle  search.  This  detector  is 
optimal  in  the  absence  of  array  perturbations. 


where  u|  is  the  variance  of  the  array  errors.  Figure  3  shows  the 
residual  correlation  VM  defined  in  (10)  as  a  function  of  a.  Note 
that  over  a  large  range  of  a,  the  residual  correlation  is  quite  low; 
the  worst  case  occurs  for  a  =  0.95.  A  value  of  a\  =  0.5  was 
used  in  the  simulations.  Figure  4  compares  the  receiver  operat¬ 
ing  characteristic  (ROC)  for  optimal  combining,  DFT  combining, 
and  matched  filtering  (which  ignores  array  miscalibration),  all  for 
a  =  0.7.  Note  the  significant  gain  in  performance  offered  by  DFT 
combining  over  matched  filtering;  the  DFT  approach  performs  vir¬ 
tually  the  same  as  optimal  combining.  Figure  5  illustrates  the  same 
information,  only  for  a  =  0.95.  Even  in  this  worst-case  scenario 
for  a ,  DFT  combining  significantly  outperforms  matched  filtering, 
and  there  is  only  a  slight  loss  in  performance  over  optimal. 

6.  CONCLUSION 

Fusing  data  collected  from  an  array  of  sensors  can  considerably  en¬ 
hance  signal  detection,  provided  the  data  is  processed  in  the  proper 
fashion.  Motivated  by  the  fact  that  extensive  previous  studies  have 
shown  that  random  array  perturbations  can  cause  significant  degra¬ 
dation  in  the  performance  of  traditional  matched-field  beamform- 
ers  which  ignore  random  perturbations,  we  propose  a  quadratic 
array  processor  which  fully  incorporates  the  statistical  nature  of 
the  perturbations.  Implementation  of  the  optimal  detector  requires 
two  main  steps:  decorrelation  in  space  followed  by  filtering  in 
time.  While  deriving  the  form  of  the  optimal  quadratic  proces¬ 
sor  is  relatively  straightforward,  our  main  contribution  is  propos¬ 
ing  an  efficient  method  of  implementing  it.  Recognizing  that  spa¬ 
tially  decorrelating  each  snapshot  is  computationally  quite  expen¬ 
sive,  we  showed  that  spatial  decorrelation  can  be  approximately 
achieved  in  a  general-purpose  fashion  with  a  discrete  Fourier  trans¬ 
form.  We  also  illustrated  how  efficient  frequency-domain  tech¬ 
niques  for  angle-of-arrival  searches  can  be  easily  incorporated  into 
our  proposed  detector.  It  turns  out  that  the  suggested  spatial  DFT 
processing  serves  two  purposes  at  once:  efficient  spatial  decorrela¬ 
tion  which  deals  with  the  random  perturbations,  and  circular  shift¬ 
ing  of  the  DFT  coefficients  allows  for  a  search  over  the  unknown 
angle  of  arrival.  Simulation  results  reveal  that  even  in  worst-case 
scenarios,  the  DFT  approximation  to  the  optimal  quadratic  detec¬ 
tor  not  only  significantly  outperforms  conventional  matched  filter¬ 
ing  techniques,  but  provides  near-optimal  performance  over  a  wide 
range  of  correlation  in  the  perturbations.  Hence  our  proposed  im¬ 
plementation  has  a  cost  comparable  to  that  of  existing  matched- 
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field  beamformers  while  providing  the  performance  benefits  of  the 
much  more  complicated  quadratic  processors. 
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OFT  Residual  Correlation 


Figure  3:  DFT  residual  correlation  as  a  function  of  a.  Note  that 
the  residual  correlation  is  small  over  a  wide  range  of  spatial  corre¬ 
lation;  the  worst  case  occurs  at  a  —  0.95. 


ROC  fora  =  0.7 


Figure  4:  ROC  for  Optimal  Combining,  DFT  Combining,  and 
Matched  Filter;  a  =  0.7.  Note  not  only  the  significant  performance 
gain,  but  the  near-optimal  performance  provided  by  DFT  combin¬ 
ing. 


ROC  lor  a  =  0.95 
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Figure  5:  ROC  for  Optimal  Combining,  DFT  Combining,  and 
Matched  Filter;  a  =  0.95.  Even  in  this  worst-case  scenario,  DFT 
combining  significantly  outperforms  matched  filtering. 
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Abstract —  Many  signal  subspace  based  approaches 
have  been  proposed  for  determining  the  fixed 
direction  of  arrival  (DOA)  of  plane  waves  impinging 
on  an  array  of  sensors.  However,  computational 
burden  of  subspace  based  algorithms  makes  them 
unsuitable  for  real  time  processing  of  nonstationary 
signal  parameters.  In  this  work,  we  present  an 
iterative  procedure  for  DOA  estimation  and  tracking. 
The  complete  procedure  consists  in,  first,  extracting 
the  noise  or  signal  subspace,  by  training  the  MCA  or 
PCA  algorithms,  respectively.  These  algorithms 
contain  only  relatively  simple  operations  and  have  self¬ 
organizing  properties.  Then,  using  Newton  algorithm, 
we  get  the  estimated  DOA.  The  performance  on 
simulated  data  representing  both  constant  and  time- 
varying  signals  of  the  approach  is  presented. 

I.  Introduction 

Subspace-based  methods  for  estimating  the 
frequencies  of  sinusoids  or  the  DOA  of  signals  impinging 
on  an  array  of  sensors  have  drawn  considerable  interest 
over  recent  years.  State-space  method  [1],  ESPRIT  [2], 
MUSIC  [3],  and  Min-Norm  [4]  are  examples  of  these 
techniques.  Based  on  the  eigendecomposition  of 
covariance  matrix  of  the  array  output,  they  offer  high 
resolution  and  give  accurate  estimate  [5],  A  key  limitation 
of  these  techniques  is  the  computational  burden  to  process 
a  new  sample  (snapshot),  so  they  are  unsuitable  for  real 
time  applications. 

Some  attempts  have  been  made  to  reduce  the 
computational  burden  of  these  methods.  Stewart  [6]  has 
introduced  the  URV  decomposition  developed  by  Liu  et 
al.  [2],  Eriksson  et  al.  [7]  extended  the  method  called 
subspace  estimation  without  eigendecomposition 
(SWEDE)  studied  in  [8],  the  method  estimates  the  DOA 
by  linear  operation  on  the  data.  An  alternative  procedure 
is  to  use  some  adaptive  algorithm  for  on-line  estimation  of 
the  desired  subset  eigendata  [9]-  [16], 

In  this  work,  we  focus  on  the  MUSIC  method  of 
Schmidt  [3];  this  method  involves  solving  a  one¬ 
dimensional  minimization  problem  and  finding  subspace 
(noise  or  signal  subspace). 


To  deal  with  the  computational  complexity  of  the 
subspace  based  method,  we  present  an  iterative  procedure 
to  update  the  DOA.  The  complete  procedure  consists  of 
two  steps:  one  performs  the  extraction  of  the  noise  (or 
signal)  subspace  using  the  well-known  Oja  (or  anti- 
Hebbian)  algorithm  [13],  respectively.  Then,  the  second 
performs  the  one-dimensional  minimization  using  the 
Newton  algorithm  [7], 

The  rest  of  this  paper  is  organized  as  follows.  In 
Section  II,  we  formulate  the  problem.  The  PCA,  MCA 
learning  and  the  Newton  algorithms  are  given  in  Section 
III.  Then,  in  Section  IV,  we  present  simulation  results. 
Finally,  Section  V  summarizes  our  conclusions. 

II.  Problem  formulation  and  basic 

ASSUMPTIONS 

Consider  a  linear  array  of  N  sensors.  The  array  output 
is  commonly  modeled  as  follows  [1] 

x(t)  =  D(Q)s(t)  +  v(t)  (1) 

where  D(0)  =  [d(01),---,d(9/7)]  is  a  Nxp  matrix  whose 
columns  are  the  direction  vectors  with  parameter  vector  0 
denoting  the  angles  of  arrival  of  the  p  signals.  s(t)  is  a 
p  X 1  vector  which  denotes  the  complex  envelopes  of  the 
narrow-band  signals.  The  elements  of  s(t)  are  assumed  to 
be  independent  Gaussian  distributed  random  variables 
with  zero  mean.  v(f)  is  a  JVx  1  vector  representing  the 
receiver  noise  of  the  N  sensors.  It  is  assumed  to  be 
complex,  zero  mean,  Gaussian  white  process,  and 
independent  of  the  signals. 

Form  (1),  the  covariance  matrix  of  x(t )  is  given  by 

R  =  E[x{t)xH  (/)]  =  D{Q)RsDh  (0)  +  a2/  (2) 

where  a2  is  an  unknown  constant  representing  the  noise 
power,  /  is  an  identity  matrix  of  appropriate  dimension, 
and  {.)H  denotes  conjugate  transpose. 

Rs  =  E[s(t)sH  (/)]  is  the  signal  covariance  matrix.  Under 
the  assumption  of  incoherent  signals,  the  rank  of  Rs  is 
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p'=  p  ,  The  eigendecomposition  of  the  positive  definite 
Hermitian  matrix  R  is  given  by 

o) 

i'=l  i=p+l 


where  is  the  eigenvalue  corresponding  to  the 

eigenvector  et ,  stored  in  decreasing  order,  for  all 


Let 

Wv  =  \ep+ 1,- 

”> eN 

and 


are  the  signal  and  noise  subspaces, 


respectively.  From  (2),  (3),  we  have  RWV  =  o2Wv ,  then 
Dh  (0)lFy  =  0 ,  or  equivalently 


dH  (0,)WV  =0,  for  i  =  1, •••,/? .  Hence,  consistent 
estimate  of  the  DOA's  can  be  determined  as  the 
minimizing  arguments  of  the  cost  function 

/  (Q)  =  dH(Q)Ud(Q)  (4) 

where  II  =  WVWVH  =  I  -WSWSH  . 

In  practical  applications,  the  exact  ensemble  covariance 
matrix  is  not  known.  A  solution  to  this  problem  consists 
in  estimating  the  covariance  matrix  from  a  finite  number 

of  snapshots  j  jc(1),  x(2),  •  •  • ,  x(T)\ 


R  =  -Zx(k)xH(k)  (5) 

T  k= 1 

Hence,  consistent  estimates  of  the  DOA's  can  be  obtained 
from  the  following  three  steps: 

Computation  of  R 
Eigendecomposition  of  R 
Minimization  of  /(0)  (equation  4). 

The  computational  burden  needed  by  the 
eigendecomposition  of  the  matrix  R  and  the 
minimization  of  (4)  makes  the  MUSIC  algorithm 
unsuitable  in  the  nonstationary  case.  To  deal  with  this 
problem,  in  next  section,  we  present  a  PCA  or  MCA  for 
extracting  the  signal  subspace  or  noise  subspace, 
respectively.  Then,  to  get  the  estimated  DOA's,  we  use  the 
Newton  algorithm  for  minimizing  the  cost  function  (4). 


III.  Derivation  of  the  Algorithms 

The  eigenvectors  corresponding  to  the  largest  and 
smallest  eigenvalues  of  the  autocorrelation  matrix  of  the 
input  signals  are  referred  as  the  principal  components  and 
minor  components,  respectively  [9-14].  Adaptively 
extracting  the  minor  and  principal  components  is  a 
primary  requirement  in  many  fields  of  signal  processing, 
including  the  eigen-based  bearing  estimation  (MUSIC  and 
Min-Norm...).  The  main  purpose  of  this  subsection  is  to 
present  the  Oja  or  the  anti-Hebbian  algorithm  for 


extracting  the  principal  components  or  the  minor 
components,  respectively,  and  the  Newton  algorithm  for 
minimizing  the  cost  function  (4). 


A.  PCA  &  MCA  Algorithms 

The  set  of  M  {M  =  N  -  p)  minimum  eigenvectors 
(minor  components)  of  R  can  be  written  as  the  solution 
of  the  following  constrained  minimization  problem  [9]- 
[14]: 


min  ek  Re k  subject  to  effe;  =5*/ 

ek  , 


Vk,/e 


1  ,-,M 


(6) 


where  5W  is  the  Kronecker  delta  function.  Equation  (6)  is 
a  complex-value  constrained  quadratic  programming 
problem.  To  convert  it  into  a  real-value  constrained 
quadratic  programming  formulation,  the  complex  vectors 
ek,  k  =  ],-■■, M  and  R  should  first  be  decomposed  into 
their  real  and  imaginary  constituents  as  follows  [12]: 

ek  =  ekr  +  j  eki  and  R  =  Rr  +  j  Rt  (7) 


then  the  equation  becomes 

[^r  "b  j  Rf  ][ekr  +  j  ekj  ]  —  Xk  [ekr  +  jeki  ] 


or  equivalently, 

7? rekr  ~Pieki  4"  j  [ ^reki  e kr^ 

=  Xk  (&kr  ~  eki  4"  j  t eki  +  ekr  I) 


Moreover,  by  combining  terms,  we  get 
Rcwk  =Xkwk 


with 


r 

X 

( 

Rc  = 

Rr 

Ri 

-Ri 

Rr 

and  wk  = 

ekr 

eki 

< 

) 

) 

where,  we  have  used 


Rc=E[Xc(t)XT(t)], 


(8) 

(9) 

(10) 

(11) 


Rr=E\ 


xr(t)xj  (t) 


+  E 


Xi(t)xf  ( t ) 


Rt  =  E  | 


Xi(t)xJ(t) 


-E 


xr{t)xj  {t) 


and  Xc  = 


—  X; 


\  J 

Rc  is  a  2Nx2N  symmetric,  positive-definite  matrix, 
and  wk  is  a  2Axl  column  vector.  Therefore,  the 
complex-value  constrained  minimization  problem  (6) 
becomes 
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min  wT 
n 


Rcwk  subject  to  wt  -Ski 


Vk,le 


\,  —  ,M 


(12) 


2)  Update  W  as  in  equation  (15). 

3)  Update  0,-,  i  =  \,  ■••,p  as  in  (16). 


The  solution  of  the  minimization  problem  (12)  is  given  by 
[13]: 


Wk+i  =wk  -Oi 


I-WkW[ 


Xc(k)Xj(k)Wk  (13) 


This  algorithm  is  known  as  the  anti-Hebbian  algorithm 
[12].  The  algorithm  (13)  can  be  used  to  extract  the 
M  (M  =  p)  principal  components,  simply  by  changing 
the  sign  of  the  parameter  ak  as 


W 


k+l  ~Wk  +a-fc 


I -wkw[ 


XJk)Xj(k)Wk(  14) 


This  algorithm  can  be  also  obtained  by  minimizing  the 
mean-square  representation  error  [14].  The  minor  and 
principal  subspace  algorithms  (13-14)  can  be  modified  to 
a  more  general  form  as  [12],  [15],  and  [16] 


Wk+ i=Wk+ak 


Xc(k)XT{h)Wk 


(15) 

where  P^.  is  a  positive  parameter.  This  algorithm  is  called 
the  weighted  subspace  algorithm  [12]. 


B.  Newton  Algorithm  [7] 

In  the  stationary  case,  the  minimum  of  /(0)  can  be 
located  as  the  peaks  of  the  spectrum  (4).  However,  in  the 
nonstationary  case,  to  track  the  minimum  of  the  cost 
function  /(0) ,  the  following  approximate  Newton 
algorithm  [7],  can  be  used 

R^d»nd,] 

V7  /  =  Uf - 

dd»  Tldd, 

V*  <l6) 

ddj  ( m )  =  yjc(m-l)cos(0/  )e)Itsin(0/)i 


IV.  Computer  simulations 

In  this  section,  we  present  some  simulation  results 
illustrating  the  properties  of  the  proposed  approach.  In  all 
examples,  we  use  a  uniform  linear  array  with  12  elements 
spaced  X./2  apart,  where  X  denotes  the  wavelength  of 
the  sources  signals. 

The  steering  vector  d(6)  is  then  given  by 


rf(0)  =  [\e  Jns in(e) ,  •  •  • ,  e  l)sin(O)  jT  ^  j  _  Cj  . 

The  noise  is  modeled  as  complex  Gaussian  with  zero 
2 

mean  and  variance  O  for  all  sensors.  The  signal  to  noise 

a 2 

ratio  is  defined  as  101og(— y).  We  consider  the 

a2 

estimation  of  the  DOA  in  the  following  scenarios:  ones 
involving  close  sources  located  at: 

1 0°  +  5°  sin(27tr  /  360) 

(17) 

20°  +  5° sin(2ro / 240),  t  =  1,2, •■•,700 
second  one  involving  well-separated  sources  located  at 
-  5°  + 1 0°  sin(2ro  /  360) 

40°  +  5°sin(2nf / 240),  t  =  l,2,---,700  (18) 

and  finally,  the  third  is  concerned  with  instantaneously 
changing  sources.  We  have  assumed  that  there  were  two 
sources  located  at  14°  and  17°,  each  with  SNR=26  dB, 
and  that  the  signals  alternatively  appear  and  disappear. 

This  example  is  adopted  from  [12],  because  it  corresponds 
to  a  sampling  rate  of  1  data  point  per  .11°,  .08°  or  .05° 
change  in  angle,  and  typical  radar  applications  produce  1 
point  per  10(lo'5)°,  so  this  example  is  much  more 
demanding. 


where  Re(.)  is  the  real  part,  and  j  0/  j  are  the  most 

recent  minimum  estimates  available.  This  Newton  step 
might  be  reiterated  a  few  times  at  each  minimum  updating 
step,  starting  from  the  estimates  provided  by  the  previous 
update.  This  algorithm  can  be  easily  derived,  using  the 
Taylor  expansion,  by  taking  /  (0)  =  0  at  the  optimum, 
for  further  information  see  [7]. 

The  proposed  procedure  in  this  paper,  can  be  summarized 
as  follows: 

1)  Initialize  VT()  as  a  2NxM  random  matrix  whose 
columns  are  orthogonal  and  with  unit  norm. 


We  have  simulated  the  above  iterative  procedure  using  the 
learning  rule  (15),  with  P  =  0. 1  and  a  =  0.01,  to  extract 

the  noise  subspace.  The  initial  weight  matrix  W0  is 
chosen  to  satisfy  W0  =  I ,  then  the  minima  of  the  cost 
function  (4)  are  computed  using  the  algorithm  (16). 

Fig.  1  gives  the  result  for  the  well-separated  sources,  Fig. 
2  gives  the  result  for  the  closed  sources,  and  Fig.  3  gives 
the  results  for  the  instantaneous  changing  sources. 

Another  example  concerns  two  fixed  signal  sources 
located  at  24°  and  29°,  the  source  power  was  26  and  23 
dB  above  the  background  noise.  The  gain  parameter 
ak  =  0.02  was  constant  during  the  first  50  iterations  and 
then  decreased  slowly.  The  result  is  given  in  Figs.  4-5. 
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Fig.  1  Estimated  time-varying  DOA's  for  well-separated 
sources. 


Fig.  2  Estimated  time-varying  DOA's  for  closed  sources. 


Fig.  3  Estimated  time- varying  DOA's  for  instantaneous 
changing  sources. 


Time  index 


Fig.  5  Estimated  DOA  for  fixed  sources 
versus  0  (degree). 

Simulation  results  show  that  the  proposed  approach  can 
be  successfully  used  for  real-time  tracking  of  time  varying 
signals. 

V.  Conclusion 

This  paper  presented  an  iterative  procedure  to  DOA 
estimation  and  tracking.  The  main  purpose  of  the  paper  is 
to  deal  with  the  computational  complexity  of  the  subspace 
based  methods.  Indeed,  to  alleviate  the 
eigendecomposition  of  the  covariance  matrix  the  anti 
Hebbian  algorithm  is  trained  to  extract  the  noise  subspace. 
Then,  the  Newton  algorithm  is  used  to  perform  the  one 
dimensional  minimization  problem.  The  performance  of 
the  approach  is  supported  by  numerical  experiments. 
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ABSTRACT 

A  frequency  domain  partially  adaptive  algorithm,  called 
a  censoring  adaptive  array  (CAA)  algorithm,  is  pro¬ 
posed  to  reduce  the  computational  complexity  of  a  fre¬ 
quency  domain  adaptive  array.  The  CAA  algorithm 
uses  a  cell  averaging  constant  false  alarm  rate  (CA- 
CFAR)  processor  to  adapt  only  those  weights  that  cor¬ 
respond  to  the  frequency  bins  expected  to  contain  in¬ 
terferences.  The  false  alarm  rate  also  adapts  according 
to  the  environment.  Furthermore,  a  censoring  spatial 
smoothing  is  proposed  to  combine  the  CAA  algorithm 
with  a  spatial  smoothing  technique.  The  performances 
of  the  proposed  algorithms  are  compared  with  conven¬ 
tional  algorithms  using  computer  simulation. 

1.  INTRODUCTION 

Adaptive  array  systems  are  designed  to  obtain  only  a 
desired  signal  in  interference  conditions  by  producing  a 
null  pattern  or  reducing  the  sidelobe  level  to  the  inci¬ 
dent  angles  of  the  interference.  Normally,  conventional 
adaptive  arrays  have  two  critical  problems.  The  first 
is  that  the  required  time  for  the  convergence  of  the 
array  output  to  a  stable  level  is  very  long  when  the 
eigenvalues  of  the  input  correlation  matrix  are  widely 
spread  out  [1].  Therefore,  to  increase  the  convergence 
speed,  the  adaptive  array  must  remove  the  correla¬ 
tion  between  the  tap-input  signals  of  each  tapped  de¬ 
lay  line  (TDL)  so  that  the  eigenvalue  spread  is  min¬ 
imized.  Chen  and  Fang  [2]  used  a  frequency  domain 
least  mean  square  (LMS)  algorithm  including  a  self- 
orthogonalizing  property  to  remove  the  temporal  cor¬ 
relation  effectively.  In  contrast,  An  and  Champagne 
[3]  used  a  two-dimensional  transform  that  can  remove 
both  temporal  and  spatial  correlations.  However,  the 
computational  complexity  of  the  above  algorithms  is 
very  high  because  of  the  computation  involved  in  trans¬ 
forming  the  input  signals  into  the  frequency  domain. 
Therefore,  the  computational  complexity  of  frequency 
domain  adaptive  arrays  needs  to  be  reduced  in  order  to 
make  the  systems  more  practical.  The  second  problem 


is  a  signal  cancellation  phenomenon  caused  by  coherent 
interferences  or  smart  jammers  [4].  This  signal  cancel¬ 
lation  phenomenon  occurs  whenever  the  interferences 
are  correlated  with  the  desired  signal,  thereby  result¬ 
ing  in  signal  loss  and  severe  signal  distortion  for  nar¬ 
row  band  and  wide  band  signals,  respectively.  Spatial 
smoothing  has  been  widely  adopted  to  solve  the  sig¬ 
nal  cancellation  phenomenon.  However,  the  computa¬ 
tional  complexity  of  an  array  with  spatial  smoothing  is 
much  higher  than  that  of  an  array  without  smoothing 
because  the  original  array  configuration  is  changed  to 
many  subarrays. 

Accordingly,  this  paper  proposes  a  new  frequency 
domain  partially  adaptive  algorithm,  called  a  censoring 
adaptive  array  (CAA)  algorithm,  which  can  reduce  the 
computational  complexity  of  frequency  domain  adap¬ 
tive  algorithms  while  maintaining  the  performances  of 
fully  adaptive  algorithms.  When  the  CAA  algorithm  is 
combined  with  spatial  smoothing,  this  can  solve  both 
the  computational  complexity  problem  in  frequency  do¬ 
main  adaptive  algorithms  and  the  signal  cancellation 
phenomenon  for  coherent  interferences. 

The  conventional  frequency  domain  adaptive  algo¬ 
rithm  is  described  in  section  2.  Section  3  presents  the 
CAA  algorithm  combined  with  spatial  smoothing  to 
remove  coherent  interferences.  The  simulation  results 
are  shown  in  section  4,  and  some  concluding  remarks 
are  made  in  section  5. 

2.  CONVENTIONAL  ALGORITHM 

The  convergence  speed  of  a  time  domain  adaptive  ar¬ 
ray  is  very  slow  when  the  eigenvalues  of  the  input  cor¬ 
relation  matrix  are  widely  spread  out.  To  increase  the 
convergence  speed,  a  frequency  domain  adaptive  array 
has  been  proposed  [2],  The  frequency  domain  gener¬ 
alized  sidelobe  canceller  (GSC)  utilizing  the  frequency 
domain  LMS  algorithm  is  shown  in  Fig.  1.  Unlike  the 
Griffiths-Jim  GSC,  transform  matrix  D  is  inserted  after 
the  blocking  matrix  in  the  auxiliary  channel.  The  goal 
of  the  frequency  domain  GSC  is  to  increase  the  con- 
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Figure  1:  Block  diagram  of  the  frequency  domain  GSC. 

vergence  speed  of  the  Griffiths- Jim  GSC.  To  apply  the 
frequency  domain  LMS  algorithm,  the  discrete  Fourier 
transform  (DFT)  transforms  the  input  signals  into  the 
frequency  domain  and  removes  any  correlation  between 
the  input  signals.  If  it  is  assumed  that  GSC  has  K  an¬ 
tenna  elements  and  the  length  of  TDL  is  L,  then  the 
array  input  signal  vector,  X(n),  at  the  nth  iteration 
time  can  be  given  by 

X(n)=  [Xf(n),  Xj(n),  ...  ,  X£(n)]T  (1) 

where  Xf(n)  =  [xi(n),  Xi(n  -  1),  ...  ,  aq(n  -  L  +  1)]T 
is  the  input  signal  vector  of  the  ith  antenna  element, 
and  Xi(j)  is  the  input  signal  of  the  ?'th  antenna  element 
at  the  jth  adaptation  cycle.  The  superscript  T  denotes 
the  transpose.  d(n)  and  y(n)  are  the  main  and  auxil¬ 
iary  channel  output  signals,  respectively.  If  the  signal 
blocking  matrix  consists  of  simple  difference  functions, 
the  output  vector  of  the  signal  blocking  matrix,  Xs(n), 
can  be  expressed  as 

Xs(n)  =  [Xi»,  X£(n),  ...  ,  X£K_»]T  (2) 

where  Xsi  (n)  is  the  input  signal  vector  of  the  ith  TDL. 
The  output  signal  of  the  frequency  domain  GSC  can  be 
expressed  as 

e(n)  -  (Wg  -  W/fDWs)X(n).  (3) 

The  optimum  weight  vector  of  the  frequency  domain 
GSC  is  similar  to  that  of  the  Griffith-Jim  GSC  given 
and  can  be  expressed  by 

W  opt  =  (4) 

where  Us(n)  =  [tiM(n),  it2,i(n),  ...  ,  uLiK-i(n)]T  is 
the  transform  domain  vector  of  Xs(n),  which  can  be 
expressed  as 

Us(n)  =  DXs(n).  (5) 


Then  the  auto-correlation  matrix  RUs  and  the  cross¬ 
correlation  matrix  Pus  are  Rus  =  E[Us(n)TJg  (n)], 
and  Pus  =  E[Us(n)d*(n)],  respectively.  By  using  (5), 
Rus  and  Pus  can  be  denoted  as 

Rus  =DRxsDh  (6) 

and 

Pus  =  DPxs.  (7) 

Then  the  mean  square  error  (MSE)  of  the  frequency 

domain  GSC  can  be  expressed  as 

C  =  £[|e(n)|2]  =  ^-PgsR-isPUs  (8) 

where  is  the  variance  of  d(n).  By  inserting  (6) 
and  (7)  into  (8),  it  can  be  easily  shown  that  the  MSE 
of  the  frequency  domain  GSC  is  the  same  as  that  of 
the  Griffiths-Jim  GSC.  Hence,  the  MSE  can  be  re¬ 
expressed  as 

c  =  ad2-(DPXs)"(DRXsD")-1(DPXs) 

=  ^-PxsRxsPxs.  (9) 

When  the  eigenvalues  of  the  input  correlation  ma¬ 
trix  are  widely  spread  out,  the  convergence  speed  of  the 
frequency  domain  adaptive  array  is  much  faster  than 
that  of  the  time  domain  adaptive  array.  However,  it 
involves  a  high  computational  complexity  when  trans¬ 
forming  the  input  signals  into  the  frequency  domain. 

3.  PROPOSED  ALGORITHM 

3.1.  Censoring  Adaptive  Algorithm 

After  the  input  signals  are  transformed  into  the  fre¬ 
quency  domain,  the  CAA  algorithm  uses  a  cell  averag¬ 
ing  constant  false  alarm  rate  (CA-CFAR)  processor  [5] 
to  determine  the  frequency  bins  which  contain  compo¬ 
nents  of  the  interference  signals.  A  block  diagram  of 
the  CA-CFAR  processor  is  shown  in  Fig.  2.  In  the  CA- 
CFAR  processor,  the  threshold,  Zt,  which  is  adjusted 
for  each  frequency  bin,  is  obtained  as  follows: 

ZT  =  b(p;^-i^j  (l  - 1)-1  (io) 

where  Pfa,  b,  and  L  are  the  desired  false  alarm  rate 
for  the  detection  of  frequency  bins  containing  interfer¬ 
ing  signals,  the  sum  of  the  neighboring  data  except 
for  those  frequency  bins  being  tested,  and  the  length  of 
the  TDL  for  each  array  element,  respectively.  After  the 
CA-CFAR  processor  has  tested  all  the  frequency  bins, 
the  frequency  domain  adaptive  algorithm  only  updates 
the  weights  connected  to  the  frequency  bins  whose  con¬ 
tents  are  greater  than  the  threshold  Zt- 
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Cell  under  test 


Figure  2:  Block  diagram  of  the  CA-CFAR  processor. 


Figure  3:  Operation  for  monitoring  variations  in  output 
powers. 


In  order  to  benefit  from  the  advantages  of  partial 
adaptive  processing,  the  false  alarm  rate  of  the  CAA 
algorithm  must  be  changed  adaptively  according  to  en¬ 
vironment  variations.  Therefore,  the  CAA  algorithm 
monitors  any  variation  in  the  output  power,  as  shown  in 
Fig.  3,  so  as  to  change  the  false  alarm  rate  adaptively. 
The  averaged  output  power  Mk  at  the  fcth  observa¬ 
tion  interval  is  obtained  by  M*  =  jj  En=i  0(fc- i)jv+ti> 
where  O;  is  the  output  power  at  the  ith  iteration.  One 
observation  interval  consists  of  N  iterations  for  op¬ 
erating  the  frequency  domain  LMS  algorithm.  The 
CAA  algorithm  then  changes  the  false  alarm  rate  of  the 
CFAR  processor  adaptively  using  these  averaged  out¬ 
put  powers  such  as  Pfa{ k)  =  Pfa{ &  —  1)  +  qSfc,  where 
7  is  the  scaling  constant,  Pfa (&)  is  the  false  alarm  rate 
at  the  fcth  observation  interval,  and  Sk  =  Mk  -  Mk-i- 
As  a  result,  the  CAA  algorithm  is  able  to  adaptively 
determine  the  optimum  subspace  for  performing  the 
frequency  domain  partial  adaptation  relative  to  the  en¬ 
vironment  plus  substantially  reduce  the  computational 
complexity  required  for  the  weight  adaptation. 


Table  1:  Computational  complexities  of  GSCs  for  dif¬ 
ferent  algorithms.  ( K :  number  of  antenna  elements,  L\ 
length  of  TDL,  Lf.  number  of  updated  weights  in  ith 
TDL,  Transform  method:  FFT) 


Algorithm 

Complex  multiplication/ cycle 

LMS 

2 L{K  -  1) 

FLMS 

L(K  -  1)  log2  L  +  3.5 L{K  -  1) 

CAA 

L(tf-l)log2L  +  3.5Ef=71k 

3.2.  Mathematical  Description 

The  Wiener-Hopf  equation  that  denotes  the  optimum 
weight  vector,  Wopt,  of  the  full  rank  GSC  is  given  by 

RusW0pt  =  Pus  (11) 

where  Rus  and  Pus  are  the  auto-correlation  matrix 
of  the  transformed  input  data,  Us(n),  and  the  cross¬ 
correlation  matrix  between  U,;(n)  and  the  main  chan¬ 
nel  output  signal,  d(n),  respectively. 

Based  on  the  self-orthogonality  property  of  the  fre¬ 
quency  domain  LMS  algorithm,  all  the  diagonal  terms 
of  Rus  are  equal  [2].  Accordingly,  the  larger  the  value 
of  (Pus)ij  which  is  the  cross-correlation  between  d(n) 
and  the  frequency  domain  signal  at  the  jth  frequency 
bin  in  the  2th  TDL,  the  more  it  affects  the  MSE  perfor¬ 
mance  of  the  GSC  [6].  Hence,  the  subspace  for  partial 
adaptation  is  composed  of  frequency  bins  that  have  a 
large  value  of  (Puskr  In  other  words,  the  signal  sub¬ 
space  composed  of  the  signals  in  the  frequency  bins 
that  have  a  high  correlation  with  the  main  channel  sig¬ 
nal  can  be  the  optimum  subspace  for  minimizing  the 
MSE.  The  proposed  GSC  adopting  the  CAA  algorithm 
is  shown  in  Fig.  4.  The  proposed  GSC  uses  the  CFAR 
processor  to  select  the  subspace  composed  of  the  signals 
in  the  frequency  bins  that  have  a  high  correlation  with 


Figure  4:  Block  diagram  of  GSC  using  CAA  algorithm. 
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d(n).  The  reduced  rank  signal  vector,  Y  s{n),  of  the 
proposed  GSC  can  be  expressed  using  Us(n)  and  the 
rank-reducing  matrix  C  formed  by  the  CAA  algorithm 
as  follows: 

Vs(n)  =  CUs(n).  (12) 

The  auto-correlation  matrix  of  Vs(n)  and  the  cross¬ 
correlation  matrix  between  Y s{n)  and  d(n)  are  as  fol¬ 
lows: 


paper  proposes  a  censoring  spatial  smoothing,  a  com¬ 
bination  of  the  CAA  algorithm  and  spatial  smooth¬ 
ing,  which  can  solve  both  the  signal  cancellation  phe¬ 
nomenon  and  the  problem  of  the  high  computational 
complexity  of  frequency  domain  adaptive  algorithms. 
Conventional  toeplitzization  sequence  via  spatial  smooth¬ 
ing  and  the  proposed  toeplitzization  sequence  via  cen¬ 
soring  spatial  smoothing  are  shown  in  Fig.  5. 


RVs  =  E[Vs(n)V§ (n)]  -  C"RUsC  (13) 


4.  SIMULATION  RESULTS 


PVs  =  E[Ys(n)d*  (n)]  =  CHPVs  (14) 

The  optimum  weights  of  the  proposed  GSC  can  be  ex¬ 
pressed  as 

W' r,opt  =  Ryj^Vj-  (15) 

Then  the  MSE  of  the  proposed  GSC  can  be  given  by 

Cr  =  a\  -  (CHPVs)H(CHRVsC)-l(CHPVs) 

=  ^-PvsRvspvs-  (16) 

Since  the  proposed  GSC  minimizes  any  additional  MSE 
caused  by  the  partial  adaptation,  the  MSE  of  the  pro¬ 
posed  GSC,  is  almost  equal  to  that  of  the  full  rank 
GSC,  C- 

3.3.  Censoring  Spatial  Smoothing  Algorithm 

Spatial  smoothing  can  solve  the  signal  cancellation  phe¬ 
nomenon,  however,  it  also  requires  high  computation 
because  it  updates  the  weights  corresponding  to  each 
subarray  in  each  adaptation  cycle  [7] .  Accordingly,  this 
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Figure  5:  Conventional  toeplitzization  via  spatial 
smoothing  and  proposed  toeplitzization  via  censoring 
spatial  smoothing. 


To  verify  the  performance  of  the  proposed  algorithm, 
several  computer  simulations  were  performed.  The  GSC 
was  assumed  to  have  15  antenna  elements  which  were 
divided  into  7  subarrys  with  9  antenna  elements  each. 
The  length  of  the  TDL  was  8  and  the  initial  false  alarm 
rate  used  in  the  CAA  algorithm  was  1.  One  target  sig¬ 
nal  with  a  Doppler  frequency  of  0.25Hz  and  a  signal-to- 
noise  power  ratio  (SNR)  of  lOdB  was  incoming  from  the 
broad  side.  Whereas  three  coherent  interferences  with 
a  Doppler  frequency  of  0.25Hz  and  an  interference-to- 
noise  ratio  (INR)  of  40dB  were  incoming  from  —50°, 
-20°,  and  34°.  The  Doppler  frequency  was  normalized 
with  respect  to  the  sampling  frequency.  The  obser¬ 
vation  interval,  N,  to  calculate  the  averaged  output 
power,  Mk,  was  set  at  100. 

Fig.  6  shows  the  learning  curves  of  the  GSC  fre¬ 
quency  domain  and  the  proposed  GSC  using  the  CAA 
algorithm.  Fig.  7  presents  the  simulation  results  of 
the  proposed  GSC  including  the  variation  in  the  false 
alarm  rates,  number  of  updated  weights,  and  pattern 
response.  The  learning  curves  of  the  two  GSCs  con¬ 
sidered  were  almost  equal.  However,  after  the  learning 
curves  were  converged,  the  proposed  GSC  only  adapted 
8  weights,  whereas  the  frequency  domain  GSC  adapted 


Figure  6:  Learning  curves  of  frequency  domain  GSC 
and  proposed  GSC. 
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(a) 


(b) 


Figure  7:  Simulation  results  of  proposed  GSC:  (a) 
Variation  of  false  alarm  rate,  (b)  Number  of  updated 
weights,  (c)  Steering  pattern  and  Adapted  pattern. 


all  64  weights.  Therefore,  the  required  complex  mul¬ 
tiplication  per  adaptation  cycle  of  the  proposed  GSC 
was  1,540  whereas  that  of  the  frequency  domain  GSC 


was  2,912.  The  false  alarm  rate  of  the  CA-CFAR  pro¬ 
cessor  was  adaptively  controlled  and  converged  to  an 
optimum  value  of  about  0.13.  The  beam  pattern  shown 
in  Fig  7(c)  shows  that  deep  nulls  were  formed  in  the 
adapted  array  pattern  in  the  incident  directions  of  ev¬ 
ery  interference. 

5.  CONCLUSIONS 

A  new  frequency  domain  partially  adaptive  algorithm, 
the  CAA  algorithm,  was  presented  which  can  adap¬ 
tively  determine  a  subspace  relative  to  the  environ¬ 
ment.  In  addition,  a  censoring  spatial  smoothing  al¬ 
gorithm  was  proposed  so  that  when  combined  with  the 
CAA  algorithm  the  computational  complexity  of  the 
frequency  domain  adaptive  algorithm  was  reduced  plus 
the  signal  cancellation  phenomenon  was  solved.  Simu¬ 
lation  results  showed  that  the  proposed  GSC  substan¬ 
tially  reduces  the  computational  complexity  of  the  GSC 
frequency  domain  while  maintaining  the  same  level  of 
performance. 
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ABSTRACT 

A  new  beamforming  algorithm,  based  on  the  eigende- 
composion  of  the  sample  correlation  matrix,  has  been 
introduced.  The  beamformer  uses  a  weighted  linear 
combination  of  the  signal  eigenvectors.  Three  versions 
of  the  beamformer  have  been  proposed.  It  is  shown 
that  the  proposed  beamformer  is  a  generalization  of  the 
delay-and-sum  and  the  minimum  variance  beamform- 
ers.  A  linearly  constrained  minimum  variance  beam- 
former  has  also  been  derived.  It  is  shown  that  the  pro¬ 
posed  approach  induces  robust  beamformers. 

1.  INTRODUCTION 

In  various  applications,  one  is  concerned  with  ex¬ 
tracting  a  desired  signal  immersed  in  noise  and  inter¬ 
ference.  Using  an  adaptive  array,  it  is  possible  to  avert 
the  effect  of  interference  and  noise  by  an  elaborate  se¬ 
lection  of  array  weights.  Many  algorithms  maximize 
the  array  output  signal  to  interference  ratio  (SINR) 
subject  to  knowing  the  direction  of  arrival  (DOA)  of 
the  desired  signal.  In  these  cases,  the  weight  vector  is 
computed  from  the  correlation  matrix  of  interference 
and  noise  .  We  call  this  the  signal-free  correlation  ma¬ 
trix  (SFCM).  However,  if  the  desired  signal  DOA  and 
the  array  geometry  are  known,  one  can  use  the  correla¬ 
tion  matrix  of  the  received  mixture  of  signal,  noise,  and 
interference,  and  attain  the  same  result.  Small  errors 
in  calibration  and  DOA  estimation  will  cause  signal 
cancellation  [1,  2]. 

For  most  practical  situations,  noise  and  interfer¬ 
ence  are  mixed  with  signal,  and  the  measurement  of 
SFCM  is  not  a  simple  task.  To  compute  a  signal-free 
correlation  matrix,  one  can  use  the  generalized  side- 
lobe  canceller  (GSC)  [3].  However,  in  this  method,  the 
calibration  error  or  the  desired  signal  DOA  estimation 
error  will  cause  a  leakage  of  signal  component  which 
degrades  the  performance  of  method. 

In  many  practical  applications,  the  performance  of 
detection  depends  on  signal- to-interference  ratio  (SIR). 


For  instance,  in  spread  spectrum  communications,  pen¬ 
etration  of  a  smart  jammer  into  the  system,  may  cause 
a  destructive  effect  on  the  system  performance  [4].  In 
such  cases,  interference  minimization,  rather  than  noise 
plus  interference  minimization,  proves  useful. 

As  a  result  of  a  higher  resolution,  much  interest 
has  been  given  to  beamforming  based  on  eigendecom- 
position  [4,  5],  and  adaptive  eigensubspace  algorithms 
[6,  7].  Usually,  these  methods  are  based  on  the  eigen- 
decomposition  of  SFCM. 

Here,  we  introduce  a  beamforming  method  based 
on  the  eigendecomposition  of  the  received  signal  covari¬ 
ance  matrix.  To  apply  this  beamforming  method,  one 
should  know  the  DOA  estimate  of  the  desired  signal, 
the  number  of  point  jammers,  and  an  estimate  of  the 
received  noise  power.  The  introduced  method,  which 
needs  a  relatively  low  computation,  is  able  to  produce 
exact  nulls  in  the  direction  of  jammers  .  It  is  also  able 
to  maximize  the  output  SINR  or  SNR.  Due  to  lack  of 
space,  throughout,  we  omit  the  proof  for  the  theorems. 

2.  SIGNAL  MODEL 

We  assume  an  L-element  array  with  arbitrary  geometry 
and  p  narrowband  point  sources.  Let  x(k)  denote  the 
complex  data  vector  received  by  the  array  elements  at 
the  fc’th  sampling  instant.  Data  vector  x(k)  can  be 
expressed  as  a  superposition  of  the  received  signals  and 
noise  as 

x(k)  =  A(k)s(k)  +  n(fc),  (1) 

where  n(fc)  is  the  noise  vector  which  is  assumed  to  be 
white,  s (k)  is  the  signal  vector,  and 

A(fc)  =  [»(*!(*)),  •••  ,a(0p(fc))],  (2) 

with  a(0,)  =  a,-  being  the  array  steering  vector  at  the 
direction  #j.  Using  (1),  and  assuming  cr2  to  be  the 
noise  power  ,  the  autocorrelation  matrix  of  the  received 
signal  is  obtained  as 

R(fc)  =  E{x{k)x(k)H}  =  A{k)TA(k)H  +  <r2I,  (3) 
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where  T  =  diag{ 71,  •  •  •  ,7P)  is  the  signal  correlation  ma¬ 
trix,  E{.}  represents  the  expected  value,  and  super¬ 
script  H  denotes  Hermitian  transposition.  Diagonal 
form  of  r  is  a  consequence  of  the  fact  that  the  received 
signals  are  assumed  to  be  uncorrelated  with  each  other. 

For  the  positive-definite  correlation  matrix  R  one 
can  find  a  set  of  eigenvalues  (A;+ct2)’s  and  orthonormal 
eigenvectors  q,’s  such  that: 

Rq*  =  (A*  +  <r2)qt  for  1  <i<  L. 

We  assume  that  Aj’s  are  in  decreasing  order,  i.e.  (Ai  + 
a2)  >  ■  ■■  >  (At  +  a2).  It  can  be  shown  that  Aj  =  0 
for  i  >  p. 

Eigenvectors  Q  =  [qi  •  •  •  qt]  can  be  divided  into 
two  matrices  as  Q  =  [QsIQn]  where  the  columns  of  Qs 
and  Qn,  respectively,  span  the  orthogonal  signal  and 
noise  subspaces.  We  can  prove  the  following  theorem. 
Theorem  1:  Defining  As  =  diag(Ai  +  cr2  •  •  •  Ap  +  <72), 
the  following  equalities  are  valid 

A*QS(AS  -  cr2I)-1QfA  =  T-1  (4) 

Qf  ArA"Qs  =  (A,  -  cr2l).  (5) 


3.  REDUCED-RANK  BEAMFORMER 


To  extract  the  n’th  signal  source  (impinging  on  the  ar¬ 
ray  from  direction  6n),  we  propose  the  following  beam¬ 
forming  weight  vector 


wn,e  = 


eA i  +  (1  -  e)a2 


for 


0  <  e  <  1 


i= 1 


(6) 

This  beamforming  method,  which  needs  the  knowledge 
of  the  desired  signal  DO  A  and  the  number  of  point 
signal  sources,  has  certain  properties  for  various  values 
of  e.  We  study  this  beamforming  method  for  three 
different  values  of  e  =  1,  0.5,  0,  (noted  by  SC-1  ,SC-2 
and  SC-3,  respectively). 


3.1.  Special  Case-1  (SC-1) 

For  this  case,  we  compute  the  weight  vector  wn  as 

w„  =  ^“a„  =  QS(AS  -  an.  (7) 

1 


Theorem  3:  The  output  SINR  of  the  SC-1  beam- 
former  is  restricted  to  Ai  / cr2  and  \p/a2 .  i.e. 


I  +  N 


(9) 


and  for  the  case  of  only  one  signal  source  {p  =  1)  the 
output  SNR  is  equal  to  Ai/cr2. 


3.2.  Special  Case— 2  (SC— 2) 

For  this  case,  we  compute  the  weight  vector  w„  as 


wn  =  T  a„  =  Q.A^Q? a„.  (10) 

i=i  Ai  +  a 

It  can  be  shown  that  here,  w„  =  R-1a„,  which  is 
the  MV  solution  for  the  array  weight  vector  —  the 
MV  beamformer  maximizes  the  array  output  signal  to 
interference  and  noise  ratio  (SINR)  [5]. 

A  shortcoming  of  the  MV  beamformer  is  its  sensi¬ 
tivity  to  signal  DO  A  uncertainty  and  array  calibration 
error  —  this  causes  signal  cancellation  [8].  Define  the 
output  SINR  sensitivity  with  respect  to  the  steering 
vector  error  (Aa  =  a  —  a)  as 


|ASINR0| 

■^SINR^a  ||Aa||2 


(11) 


where 


ASINRc  =  SINR0|a=a+Aa  -  SINR„|a=a  -  (12) 

It  can  be  shown  that  the  SC-2  beamformer  is  less  sen¬ 
sitive  to  the  steering  vector  error  (due  to  DOA  uncer¬ 
tainty  or  uncalibrated  array)  when  compared  to  the 
MV  method  —  the  sensitivity  of  MV  beamformer  in¬ 
creases  rapidly  with  input  SNR. 

3.3.  Special  Case— 3  (SC— 3) 

For  this  reduced-rank  beamformer,  the  weight  vector, 
w„,  is 

w„  =  T  (i3) 

i—1 

Using  QsQf  =  I  -  QnQff  and  noting  that  the  signal 
steering  vector  is  orthogonal  to  the  noise  subspace,  (13) 
can  be  written  as 


Theorem  2:  The  pattern  for  the  SC-1  beamformer 
has  null  (exactly  zero)  in  the  direction  of  interferers, 
i.e. 

w^a,  =  0  for  i  =  l,--,p  and  i^n  (8) 


w„  =  -ja  n-  (14) 

cr 2 

If  the  true  an  is  known,  the  weight  vector  (14)  produces 
the  well-known  delay-and-sum  beamformer. 
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Definition:  For  an  array  with  w  as  a  weight  vector, 
we  define  the  sensitivity  of  an  array  output  SNR  with 
respect  to  the  array  steering  vector  error  (Aa  =  a  -  a) 
as 


^SNR0,a 


|A  SNR0 
l|Aa|P 


(15) 


where 


A SNR0  =  SNR0  |a=a+Aa  -  SNR0 |a=a  .  (16) 

It  can  be  proved  that  the  output  SNR  for  the  SC-3 
beamformer  is  less  sensitive  to  the  array  steering  vector 
error  than  the  delay-and-sum  beamformer. 

For  SC-3,  the  array  output  SNR  is 


We  have  proved  that  the  maximum  output  SNR  for  an 
array  with  L  elements  is  L  times  the  input  SNR.  Thus, 
SC-3  maximizes  the  output  SNR. 


3.4.  Improved  LCMV  method 

As  mentioned  earlier,  the  SC-2  beamformer  has  the 
properties  of  MV  method  with  a  smaller  sensitivity.  In 
the  MV  method,  the  weight  vector  is  the  solution  of 
the  following  minimization 

minjw^Rw}  subject  to  wffa (0j)  =  g  (18) 

where  g  is  a  constant.  By  the  method  of  Lagrange 
multipliers,  the  solution  to  this  minimization  is 


w  = 


R-‘a 

5awR_1a 


(19) 


The  single  linear  constraint  in  (18)  can  be  generalized 
to  a  multiple  linear  constraint.  For  instance,  to  pro¬ 
duce  a  beampattern  with  a  unit  gain  in  the  direction 
of  sources,  Oi  and  62,  the  desired  constraint  may  be 
expressed  as 


'  a"(0r) 

■  1  ■ 

.  a h{62) 

w  = 

1 

If  there  are  m  <  L  linear  constraints  on  w,  it  is  possible 
to  write  them  in  the  matrix  form  C^w  =  f,  where  the 
L  x  m  matrix  C  and  the  m-dimensional  vector  f  are 
the  constraint  matrix  and  the  response  vector,  respec¬ 
tively.  It  is  assumed  that  the  constraints  are  linearly 
independent  —  the  constraint  matrix  has  rank  m.  The 
solution  of  (18)  is  then 

w  =  R-1C[C"R-1C]-1f  (21) 


which  is  called  the  linear  constraint  minimum  variance 
(LCMV)  weight  vector.  Similar  to  (6),  we  define  the 
following  weight  vector  which  satisfies  the  constraint 
CHw  =  f, 

w  =  HC[CffHC]_1f  ,  (22) 

where  H  is  defined  as 


H  =f 


V  H 

Eq»qf 

,=i  +  (1  —  e)^2 


(23) 


We  call  this  technique,  the  improved  LCMV  (ILCMV) 
algorithm.  Replacing  Karhunen-Loeve  expansion  in 
(21),  it  is  straightforward  to  prove  that  when  the  columns 
of  C  are  a  subset  of  the  columns  of  A,  the  weight  vector 
(21)  is  the  same  as  w  in  (22).  However,  the  simulation 
results  show  an  improvement  for  ILCMV  when  com¬ 
pared  to  LCMV. 


4.  SIMULATION  RESULTS 

In  the  following  examples,  we  use  a  uniform  circular 
array  (UCA)  with  L  omnidirectional  antenna  elements. 
The  interelement  spacing  is  assumed  to  be  A/2  where 
A  is  the  received  signal  wavelength.  Three  stationary 
point  signal  sources  with  the  same  power  are  used  in 
simulations. 

The  effect  of  e  in  (6)  on  the  produced  pattern  for 
e  =  0, 0.2, 0.4, 0.6, 0.8  is  shown  in  Fig.  1,  for  the  desired 
source  at  180°,  and  interfering  sources  at  125°,  and 
280°  (L  —  8).  The  figure  shows  that  as  e  increases,  the 
two  relative  nulls  of  the  beampattern  move  towards  the 
jammers  and  become  deeper. 

In  the  second  example,  we  choose  a  random  DOA 
for  signal  and  jammers.  Fig.  2  shows  the  average  out¬ 
put  SIR,  SINR,  and  SNR  as  a  function  of  e  for  the 
proposed  beamformer  choosing  a  random  DOA  for  sig¬ 
nals.  The  input  SNR  is  assumed  to  be  3dB  and  L  =  8 
is  considered.  The  curves  show  that  the  output  SIR  de¬ 
creases  rapidly  with  decreasing  e,  however,  the  changes 
in  SNR  and  SINR  are  not  substantial. 

In  Fig.  3,  the  produced  beampatterns  with  LCMV 
and  ILCMV  methods  are  compared  (for  L  =  15).  Here, 
the  signal  source  DOAs  are  at  80°  and  240°,  and  an 
interferer  is  located  at  160°. -  The  results  clearly  show 
the  robustness  of  the  proposed  method  against  DOA 
uncertainty  and  array  calibration  error. 
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Figure  1:  The  effect  of  e  on  the  produced  pattern  for 
e  =  0,0.2, 0.4, 0.6, 0.8.  The  desired  source  is  located 
at  180°,  and  interfering  sources  are  at  125°,  and  280° 
(L  =  8). 


Figure  2:  The  average  output  SIR,  SINR,  and  SNR  as 
a  function  of  e  for  an  8-element  UCA. 


Figure  3:  Beampattern  for  the  LCMV  (top)  and  the 
ILCMV  (bottom)  algorithms. 
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ABSTRACT 

A  hypothesis  testing  methodology  for  determining  the  num¬ 
ber  of  narrowband  sources  impinging  on  an  array  is  pre¬ 
sented.  Using  multiple  hypothesis  tests  the  multiplicity  of 
the  smallest  ordered  eigenvalues  of  the  sample  correlation 
matrix  and  hence  the  number  of  sources,  is  determined. 
The  finite  sample  null  distributions  of  the  test  statistics  are 
estimated  using  bootstrap  resampling.  By  removing  the 
assumption  of  Gaussianity  and  large  sample  size  that  the 
traditional  MDL  approach  is  based  on,  we  are  able  to  gain 
improvements  in  the  small  sample  case  or  when  there  are 
deviations  from  Gaussianity. 

1.  INTRODUCTION 

The  first  step  in  most  array  signal  processing  problems  is 
to  determine  the  number  of  narrowband  sources  imping¬ 
ing  on  an  array.  Traditional  approaches  are  based  on  the 
application  of  information  theoretic  criteria,  such  as  Rissa- 
nen’s  Minimum  Description  Length  (MDL),  to  an  estimate 
of  the  likelihood  function  of  the  ordered  eigenvalues  of  the 
data  [8,  5].  For  Gaussian  data,  the  MDL  is  asymptotically 
consistent  and  becomes  a  simple  function  of  the  ordered 
sample  eigenvalues. 

Instead  of  using  the  MDL,  whose  behaviour  is  uncer¬ 
tain  for  small  sample  sizes  or  non-Gaussian  data,  we  esti¬ 
mate  the  small  sample  distributions  of  the  ordered  eigen¬ 
values  using  a  bootstrap  resampling  technique  [4].  A  multi¬ 
ple  hypothesis  test  is  then  applied,  sequentially  testing  for 
equality  of  the  smallest  ordered  eigenvalues  to  determine 
the  number  of  sources. 

By  estimating  the  distributions  of  the  ordered  eigenval¬ 
ues,  detection  rates  can  be  improved  when  the  sample  size 
is  small,  or  when  the  signal  deviates  from  Gaussianity. 

The  paper  is  organised  as  follows.  In  section  2  the  signal 
model  is  described  before  discussing  the  testing  methodol¬ 
ogy  and  the  use  of  multiple  hypothesis  tests  in  section  3. 
In  section  4  the  use  of  the  bootstrap  in  estimating  the  null 
distributions  of  the  test  statistics  is  explained.  Section  5 
points  out  the  need  to  reduce  the  bias  of  the  sample  ordered 
eigenvalues  and  describes  the  method  of  jackknife  bias  re¬ 
duction.  Finally,  section  6  compares  the  proposed  method 
against  the  MDL,  followed  by  some  conclusions. 

‘This  work  was  in  part  supported  by  the  Australian  Telecom¬ 
munications  Cooperative  Research  Centre  (AT-CRC). 


2.  SIGNAL  MODEL 


We  receive  n  i.i.d  snapshots,  x(t),  of  complex  data  from  a 
p  element  array, 

x(t)  =  As(t)  +v(t),  <  =  1, . . .  ,n  (1) 

where  A  is  the  p  x  q  array  steering  matrix,  s(t)  is  the  q 
( q  <  p)  vector  valued  source  signal  and  v(t)  is  spatially 
white  additive  noise  with  variance  a2. 

The  correlation  matrix  of  the  snapshots  is  then 

R  =  E  ]^x{t)xH  (t)]  =  AR.Ah  +  a2 1  (2) 

where  R,  —  E  [s(t)sH(t)j.  Let  the  ordered  eigenvalues  of 
R  be  Ai  >  •  •  •  >  A,  >  A,+i  =  ■■■  —  \p.  This  suggests  we 
estimate  q  by  determining  the  multiplicity  of  the  smallest 
ordered  eigenvalues  of  the  sample  correlation  matrix, 

R=  ■^—j^2x(t)xH  (t).  (3) 


3.  HYPOTHESIS  TESTING 


To  test  for  multiplicity  of  the  smallest  ordered  eigenvalues 
we  consider  the  following  situations  where  the  correspond¬ 
ing  number  of  sources,  q,  is  as  stated, 

Ai  =  A2  =  ...  =  Ap  q  =  0 

A2  =  ...  =  A  p  5  =  1 


Ap— 1 


q  -  p- 2 


^1  /  A2  /  ...  #  Ap  q  >  p  —  1 

To  determine  q  we  are  required  to  test  which  of  the 
above  conditions  is  true.  To  accomplish  this  we  propose 
the  following  procedure, 


1.  Set  k  =  0. 

2.  Test  for  equality  of  the  smallest  p  —  k  eigenvalues. 
If  this  hypothesis  is  accepted  then  q  =  k  and  the 
procedure  is  finished. 

3.  If  we  rejected  the  hypothesis  and  k  <  p  -  2  then  set 
k  =  k  +  1  and  return  to  step  2.  If  k  =  p  —  2  then 
q  =  p  —  2  was  rejected,  so  we  must  assume  all  the 
eigenvalues  are  unequal  and  q  >  p  —  1. 
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To  perform  the  joint  test  for  equality  of  the  smallest 
eigenvalues  at  each  stage  of  the  procedure  we  make  use  of 
Roy’s  union  intersection  (UI)  method.  Roy’s  UI  method 
allows  us  to  construct  a  test  for  any  joint  hypothesis,  H, 
providing  it  can  be  expressed  as  an  intersection  of  simpler 
subhypotheses,  Hi  for  which  tests  exist.  That  is,  if  H  — 
Hi  Hi  and  tests  for  the  H,  are  available,  then  the  rejection 
region  for  H  is  given  by  the  union  of  rejection  regions  of  the 
Hi.  This  means  the  global  null  hypothesis,  H,  is  rejected  if 
at  least  one  of  the  Hi  is  rejected.  Prom  this  we  can  define 
the  family  wise  error  rate  (FWE)  as 

FWE  =  Pr (Reject  at  least  one  Hi  |  all  Hi  are  true). 

The  FWE  plays  a  similar  role  to  that  of  the  set  level,  a,  in 
univariate  testing. 

Following  the  UI  principle  a  test  for  A*  =  . . .  =  Ap  can 
be  constructed  from  the  hypotheses  Hij  :  A;  =  Xj  where  the 
rejection  region  is  given  by  Clf”^  •  The  special  case 

of  Ai  /  . . .  ^  Ap  is  chosen  when  all  other  hypotheses  have 
been  rejected,  as  already  stated.  Note  that  the  alternative 
hypothesis  to  each  of  the  Hij  is  K\j  :  A;  >  Aj .  _ 

To  test  each  of  the  Hij  while  maintaining  the  FWE  we 
must  use  a  multiple  test  procedure.  Here,  both  Bonferonni’s 
single  step  and  Holm’s  sequentially  rejective  Bonferonni  al¬ 
gorithm  (SRB)  are  used. 

The  Bonferonni  method  tests  each  of  the  Hij  at  a  level 
of  a /l  where  l  is  the  number  of  hypothesis  being  tested. 
Assuming  the  significance  levels  are  independent  and  uni¬ 
formly  distributed  on  [0, 1]  this  method  exactly  controls  the 
FWE  at  level  a. 

In  this  problem  not  all  the  hypotheses  are  independent. 
There  are  logical  implications  between  hypotheses  so  that 
the  truth/falsehood  of  some  imply  the  truth/falsehood  of 
others.  For  instance,  if  H\p  were  true,  then  this  would  imply 
all  the  Hij  were  true. 

When  the  hypotheses  implicate  each  other,  stepwise 
methods  such  as  Holm’s  SRB  strongly  control  the  FWE. 
For  more  details  on  Holm’s  SRB  and  multiple  hypothesis 
testing  in  general  see  [9]. 


Table  1:  Bootstrap  procedure  for  resampling  eigenvalues. 

1.  Randomly  select  a  single  snapshot  from  the  matrix 
of  array  snapshots  with  replacement  (a  column  of 
X). 

2.  Repeat  the  random  selection  n  times  to  obtain  a 
resample  of  the  matrix  of  array  snapshots,  X*. 

3.  Estimate  the  sample  correlation  matrix  of  X*,  R  . 

4.  Centre  X*  by  subtracting  the  column-wise  mean 
from  each  column. 

5.  The  resampled  eigenvalues,  AJ , . . .  ,  A*  are  esti- 

A  * 

mated  from  the  centred  R  ■ 

6.  Repeat  steps  1  to  5,  B  times  to  obtain  the 
bootstrap  set  of  eigenvalues  A)  (6), . . .  ,  A  1(b),  b  = 

1  _ 


complex  nonlinear  nature  of  eigenvalue  estimation.  How¬ 
ever,  it  is  shown  that  bootstrapping  with  fewer  resamples, 
m  <  n,  where  min(m,n)  — >  oo  and  m/n  — >  0,  ensures  the 
bootstrap  converges  to  the  asymptotic  distribution. 

For  the  small  sample  sizes  considered  here  it  is  diffi¬ 
cult  to  fulfill  these  conditions  without  increasing  the  error 
in  the  bootstrap  distribution  due  to  the  reduced  number  of 
resamples  adversely  affecting  the  bias  and  variance  of  the 
resampled  eigenvalues.  However,  for  large  sample  sizes  we 
may  reduce  the  error  more  effectively  by  subsampling  (re¬ 
sampling  m  <  n  times  without  replacement)  and  estimating 
the  rate  of  convergence  of  the  distribution  to  its  asymptotic 
limit.  The  conditions  under  which  subsampling  is  valid  en¬ 
compass  a  wider  range  of  distributions  and  statistics  than 
the  bootstrap,  more  on  the  theory  of  subsampling  and  rate 
estimation  may  be  found  in  the  recent  publication  [3]. 

Although  we  have  investigated  the  use  of  subsampling 
for  this  problem,  the  bootstrap  was  found  to  provide  a  suf¬ 
ficiently  accurate  estimate  for  the  eigenvalue  distributions 
considering  the  sample  sizes  used. 


4.  BOOTSTRAP  PROCEDURE 

To  evaluate  the  significance  levels  for  the  multiple  hypoth¬ 
esis  tests  we  require  the  null  distribution  of  each  test  statis¬ 
tic,  Tij  =  \i  —  Xj,  where  i,  j  are  defined  as  for  Hij . 

We  use  the  bootstrap  [4],  to  estimate  the  null  distri¬ 
butions.  Briefly,  we  randomly  resample  from  the  matrix 
of  array  snapshots  X  =  (x(l),  sc(2), . . .  ,  x(n ))  to  generate 
a  bootstrap  data  set,  X*.  Recalculating  the  test  statistic 
from  X*  gives  us  T*  =  A*  -Xj,  the  bootstrap  procedure  for 
resampling  eigenvalues  is  summarised  in  Table  1.  Repeat¬ 
ing  this  procedure  B  times  gives  us  the  set  of  bootstrapped 
test  statistics,  T*j(h),  b  =  1, . . .  ,  B. 

Critical  points  or  significance  levels  can  then  be  found 
from  the  test  statistic,  Tij,  and  the  bootstrap  distribution, 
T*j,  by  forming  7y (b)  -  Tij,  which  approximates  the  distri¬ 
bution  of  Tij  under  the  null  [4], 

In  [1,  2]  the  performance  of  the  bootstrapped  eigenval¬ 
ues  is  considered.  They  show  that  the  bootstrap  converges 
to  the  asymptotic  distributions  for  distinct  eigenvalues,  but 
not  for  multiple  eigenvalues.  This  may  be  attributed  to  the 


5.  BIAS  REDUCTION 


Though  the  sample  eigenvalues  are  asymptotically  unbi¬ 
ased,  the  bias  may  be  significant  for  finite  sample  sizes. 
Here  we  are  particularly  concerned  with  small  sample  per¬ 
formance  and  some  bias  reduction  is  required.  The  reason 
we  need  bias  reduction  is  stems  from  the  nature  of  the  hy¬ 
pothesis  tests  and  the  use  of  the  bootstrap.  A  bias  in  the 
estimator  will  remain  in  the  test  statistic  but  will  be  es¬ 
sentially  removed  from  the  bootstrap  estimate  of  the  null 
distribution.  This  occurs  as  the  estimate  of  the  null  distri¬ 
bution  is  created  by  subtracting  the  test  statistic  from  the 
bootstrapped  test  statistics,  as  the  bias  of  the  test  statistic 
is  very  near  the  bias  of  the  bootstrapped  test  statistics  the 
bias  essentially  subtracts  out. 

The  expected  value  of  the  q  distinct  sample  eigenvalues 
is  [6] 


=  A;  + 


h. 

n 


E 

i= i. 


(4) 
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Subtracting  the  term  in  1/n  from  A;  and  replacing  the  true 
eigenvalues  by  their  estimates  gives  the  bias  reduced  esti¬ 
mates  of  the  q  distinct  eigenvalues, 

A?=  Aifi-I  £  - — -P  .  ^  V 

V  n;=wi  A. -A,  n  A,-  —  A y 

*  =  !.••■  i9,  (5) 

where  A  is  the  value  of  the  multiple  eigenvalues,  A  =  A^+i  = 
•  •  •  =  A p,  which  may  be  replaced  with  the  maximum  likeli¬ 
hood  estimate, 

1  p 

=  £  Aj,  i  =  9  +  l, (6) 

*  >=9+1 

The  corrected  eigenvalues  of  (5)  have  a  bias  of  order  1/n2, 
while  those  of  (6)  are  unbiased. 

It  is  not  possible  to  use  these  bias  reduced  estimates 
blindly  as  the  multiplicity  of  the  eigenvalues  is  required. 
Also,  the  difference  between  successive  distinct  eigenvalues 
must  be  large  compared  to  the  sampling  errors,  which  are 
of  order  1  fs/n.  If  this  condition  is  not  fulfilled,  the  variance 
of  the  corrected  distinct  eigenvalues  can  increase  dramati¬ 
cally  [6].  This  is  easily  understood  by  considering  the  effect 
of  very  close  distinct  eigenvalues  on  the  denominator  of  the 
summation  in  (5). 

We  evaluated  several  alternative  techniques  for  bias  re¬ 
duction  based  on  resampling  methods.  The  advantage  of 
resampling  techniques  to  bias  reduction  in  our  case  is  that 
they  may  be  applied  blindly,  with  no  knowledge  of  the  eigen¬ 
value  multiplicity.  The  jackknife  was  found  to  be  most  ef¬ 
fective  scheme,  it  reduced  the  bias  at  least  as  much  as  (5) 
and  did  not  suffer  from  any  large  increases  in  variance,  even 
for  multiple  eigenvalues. 

We  must  also  consider  the  effects  of  non-Gaussianity  on 
the  bias.  Here  the  jackknife  has  an  advantage  over  (5)  which 
was  derived  under  the  assumption  of  Gaussianity.  For  non- 
Gaussian  data  the  bias  also  depends  on  the  cumulants  of 
the  underlying  distribution  [7].  In  the  non-Gaussian  case 
then,  the  jackknife  is  still  valid  as  it  is  a  distribution  free, 
though  not  distribution  insensitive,  method. 

The  procedure  for  jackknife  bias  reduction  is  given  in 
Table  2,  more  details  may  be  found  in  [4],  In  all  cases 
jackknife  bias  reduction  was  applied  to  eigenvalue  estimates 
and  bootstrapped  eigenvalues.  Applying  bias  reduction  to 
the  bootstrapped  eigenvalues  is  necessary  as  it  alters  the 
variance  of  the  estimate  and  this  change  in  the  test  statistic 
must  be  matched  in  the  estimate  of  the  null  distribution. 
It  also  helps  to  mitigate  any  residual  bias  in  estimating  the 
null  distribution. 

6.  SIMULATIONS 

In  the  following  simulations  the  proposed  method  is  evalu¬ 
ated  by  comparing  it  to  the  MDL  [8]  in  a  variety  of  scenar¬ 
ios.  Both  the  Bonferonni  procedure  and  Holm’s  SRB  are 
shown.  Some  parameters  which  remain  unchanged  through¬ 
out  the  tests  are:  the  number  of  resamples,  B  =  200,  the 
FWE,  a  =  0.02  and  the  element  spacing  which  was  one 
half  the  wavelength.  The  signals  are  also  Gaussian,  unless 


_ Table  2:  Jackknife  Bias  Reduction _ 

1.  Given  the  matrix  of  array  snapshots  X ,  define 
the  ith  jackknife  sample  of  X  to  be  X = 
(*(!),•••  ,*(*-  1),  x(i  +  1), . . .  ,x(n)). 

2.  Let  Aj*\ . . .  ,  Aj,1'  be  the  ordered  eigenvalues  esti¬ 
mated  from  X^. 

3.  Compute  A^  =  l/n^"=1 

4.  The  bias  reduced  eigenvalues  are  given  as 
A>  =  Aj  —  (n  —  1)(A*')  -  Aj). 


Figure  1:  Empirical  probability  of  correctly  detecting  two  nar¬ 
rowly  separated  sources  as  the  direction  of  one  is  varied. 


otherwise  stated.  All  results  were  averaged  over  400  Monte 
Carlo  simulations. 

Angular  resolution  :  (Figure  1)  We  have  ap  =  4  element 
array  with  q  -  2  sources.  The  first  source  is  fixed  at  20  de¬ 
grees  (with  respect  to  broadside)  while  the  other  is  allowed 
to  vary  between  20  and  40  degrees.  Both  sources  were  at 
OdB  SNR  and  there  were  n  =  100  snapshots. 

Effect  of  SNR  :  (Figure  2)  The  conditions  are  the  same 
as  above  except  that  the  second  source  was  removed  and 
the  SNR  was  varied. 

Correlated  sources  :  (Figure  3)  We  have  a  p  =  6  element 
array  with  q  =  2  correlated  sources  at  20  and  40  degrees 
and  SNR’s  of  -3  and  OdB  respectively.  The  correlation  coef¬ 
ficient  between  the  two  sources  is  varied  from  0.66  to  0.99. 

While  the  MDL  appears  superior  under  the  ideal  condi¬ 
tions  of  widely  separated  sources,  high  SNR  or  weakly  cor¬ 
related  sources  there  is  a  noticeable  improvement  for  the 
more  difficult  cases  of  narrowly  separated  sources,  low  SNR 
and  highly  correlated  sources.  For  example,  at  an  SNR 
of  -7dB,  the  proposed  method  correctly  detects  the  single 
source  at  a  rate  of  80%  while  the  MDL  is  at  40%.  One 
point  to  note  is  that  both  Bonferonni  and  Holm’s  methods 
behave  very  similarly,  this  is  commented  on  later. 

Sample  Size  :  (Figure  4)  We  have  ap  =  4  element  array 
with  g  =  l  source  at  20  degrees  and  -7dB  SNR.  The  sample 
size  was  varied  over  10  <  n  <  250. 

As  suspected  we  notice  an  improvement  in  the  small 
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Figure  2:  Empirical  probability  of  correctly  detecting  a  single 
source  as  the  SNR  is  varied. 


Figure  3:  Empirical  probability  of  correctly  detecting  two  cor¬ 
related  sources  as  the  correlation  coefficient  is  varied. 


sample  case  up  to  n  =  250,  suggesting  there  is  an  advantage 
in  using  the  proposed  method  for  small  sample  sizes.  For 
instance,  with  n  —  100  the  detection  rates  are  80%  and  40% 
for  the  proposed  method  and  MDL  respectively. 
Non-Gaussianity  :  (Figures  5,  6)  Here  we  have  the  same 
conditions  as  above,  except  that  for  Figure  5  we  have  a 
single  Laplacian  source  in  Gaussian  noise  while  for  Figure  6 
we  have  a  single  Gaussian  source  in  Laplacian  noise.  The 
sample  size  was  varied  over  10  <  n  <  200. 

Comparing  both  these  non-Gaussian  cases  to  the  previ¬ 
ous  Gaussian  example  it  is  apparent  that  the  behaviour  of 
the  proposed  methods  with  respect  to  sample  size  is  sim¬ 
ilar.  Detection  rates  do  drop  for  the  non-Gaussian  cases, 
though  the  improvement  over  the  MDL  is  clear,  suggest¬ 
ing  the  method  is  more  robust  than  the  MDL  to  deviations 
from  Gaussianity. 

FWE  :  (Figure  7)  Here  we  show  the  probability  of  correctly 
accepting  the  global  null  hypothesis,  that  all  eigenvalues  are 
equal,  for  a  p  =  4  element  array  with  q  =  0  sources.  The 
sample  size  was  varied  over  10  <  n  <  250. 

In  a  source  free  environment  the  FWE  rate,  or  the  prob¬ 
ability  of  rejecting  the  null  hypothesis  that  all  eigenvalues 


Figure  4:  Empirical  probability  of  correctly  detecting  a  single 
source  as  the  sample  size  is  varied. 


Figure  5:  Empirical  probability  of  correctly  detecting  a  single 
Laplacian  source  in  Gaussian  noise  as  the  sample  size  varies. 


Figure  6:  Empirical  probability  of  correctly  detecting  a  single 
Gaussian  source  in  Laplacian  noise  as  the  sample  size  varies. 


are  equal,  should  be  maintained  close  to  the  set  level,  which 
in  this  case  is  a  =  0.02.  It  can  be  seen  that  the  FWE  is  not 
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Figure  7:  Empirical  probability  of  correctly  identifying  q  =  0  Figure  8:  Empirical  probability  of  correctly  detecting  q  —  p  — 
sources  in  a  source  free  environment  as  the  sample  size  varies.  1=3  sources  as  the  sample  size  varies. 


exactly  maintained,  instead  it  is  approximately  0.03.  Inves¬ 
tigation  of  the  significance  levels  showed  they  are  slightly 
nonuniform,  approximately  3%  were  less  than  0.02.  This 
can  be  attributed  to  the  errors  in  estimating  the  null  distri¬ 
bution  of  the  hypotheses.  Thus  the  assumption  of  uniform 
significance  levels  in  the  Bonferonni  methods  is  not  true 
and  the  FWE  cannot  be  exactly  maintained,  explaining  the 
attained  FWE  near  0.03.  Disregarding  variation  due  to  the 
finite  number  of  Monte  Carlo  realisations  used,  the  level 
of  the  test  appears  to  be  constant  with  respect  to  sample 
size.  This  is  expected  as  the  level  of  the  test  should  be  in¬ 
dependent  of  the  array  size  and  the  sample  size.  The  FWE 
should  be  kept  small  (<  0.05)  to  avoid  unnecessary  false 
detections,  however,  as  the  FWE  decreases  we  must  take 
more  resamples  to  properly  estimate  the  critical  points  and 
the  computational  load  increases  [4].  Also,  decreasing  the 
FWE  decreases  the  power  of  the  test  and  the  performance 
improvement  over  the  MDL  will  decrease. 

Source  saturated  environment  :  Finally  we  show  an 
example  when  we  have  q  =  3  sources  and  ap  =  4  element 
array.  This  is  the  maximum  number  of  sources  we  can  de¬ 
tect  for  p  =  4.  The  sources  were  at  10,  30  and  50  degrees 
with  SNR’s  of  -2,  2  and  6dB  respectively.  The  sample  size 
was  varied  over  10  <  n  <  200. 

Here  we  can  see  a  large  improvement  over  the  MDL, 
suggesting  the  proposed  method  is  well  suited  to  source 
saturated  environments. 

In  general  it  appears  that  the  proposed  method  is  rela¬ 
tively  insensitive  to  the  multiple  test  procedure.  It  is  diffi¬ 
cult  to  determine  the  cause/s  of  this,  though  error  in  esti¬ 
mating  the  null  distribution  is  certainly  important.  We  are 
currently  investigating  whether  more  powerful  tests  specifi¬ 
cally  tailored  to  the  implications  among  the  hypotheses  will 
yield  an  improvement. 

7.  CONCLUSION 

Here  we  approached  the  source  detection  problem  in  array 
processing  from  a  hypothesis  testing  viewpoint.  Instead 
of  using  information  theoretic  criteria  designed  for  large 
samples  and  Gaussian  signals,  we  test  for  equality  of  the 


smallest  ordered  eigenvalues  of  the  sample  correlation  ma¬ 
trix  using  multiple  hypothesis  tests.  By  estimating  the  fi¬ 
nite  sample  null  distributions  of  the  test  statistics  using  the 
bootstrap  we  show  an  improvement  over  the  MDL  for  small 
sample  sizes  or  when  there  are  deviations  from  Gaussianity. 
The  proposed  method  also  performs  favorably  compared  to 
the  MDL  under  a  variety  of  situations  including  low  SNR, 
strong  source  correlation,  narrowly  separated  sources  and 
saturated  source  environments. 
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ABSTRACT 

This  paper  presents  a  new  robust  algorithm  for  scat¬ 
tered  source  localization.  The  proposed  algorithm  is 
based  on  a  decomposition  of  the  channel  vector  into 
subspaces  characterized  by  their  sensitivities  to  the  spa¬ 
tial  source  parameters,  such  as  the  source  spread  which 
is  usually  treated  as  an  unknown  nuisance  parame¬ 
ter.  This  decomposition  isolates  a  subspace  of  the  data 
which  is  not  a  function  of  the  unknown  nuisance  param¬ 
eters,  and  the  resulting  estimator  does  not  involve  any 
search  over  these  parameters.  The  Maximum-Likelihood 
estimator  for  the  new  decomposed  model  is  developed. 
The  estimator  uses  only  the  information  carried  by  the 
insensitive  subspace  of  the  data  while  perturbations  of 
the  channel  vector  in  the  sensitive  subspace  are  as¬ 
sumed  to  be  unknown  parameters.  Identification  of 
the  insensitive  subspace  is  done  according  to  the  chan¬ 
nel  vector  covariance  matrix.  Simulation  results  are 
presented  to  demonstrate  the  effectiveness  of  the  pro¬ 
posed  algorithm. 

1.  INTRODUCTION 

Traditional  array  processing  techniques  assume  a  wave- 
field  generated  by  point  sources.  However,  point  source 
assumption  does  not  hold  in  many  practical  problems. 
For  instance,  multipath  propagation  in  mobile  radio 
communications  or  low-elevation  radio  link  affect  the 
spatial  distribution  of  the  observed  signal.  It  may  be 
more  reasonable  to  assume  that  most  of  the  energy  in¬ 
cident  on  the  array  is  from  local  scattering  near  the 
transmitter.  The  diffuse  propagation  can  then  be  de¬ 
scribed  by  the  superposition  of  a  large  number  of  plane 
waves,  reflected  by  so-called  reflectors  standing  for  as 
much  point  sources  and  distributed  around  the  real  di¬ 
rection  of  the  transmitter. 

In  their  pioneering  work,  Valaee  et  al  [1]  presented 
the  problem  of  distributed  sources  and  proposed  a  para¬ 
metric  approach  to  localize  distributed  sources.  They 
consider  incoherently  and  coherently  distributed  sources 
where  the  angular  signal  density  is  assumed  to  be  known. 
In  [2]  and  [3]  a  generalized  array  manifold  model  was 
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used  for  DOA  estimation  in  channel  with  local  scat¬ 
tering  assuming  fully  coherent  reflections  in  which  it 
was  shown  that  local  scattering,  has  significant  im¬ 
pact  on  direction-of-arriva.l  (DOA)  estimation  for  time- 
invariant  channels.  Performance  limits  of  distributed 
source  localization  has  been  studied  through  Cramer- 
Rao  bound  (CRB).  A  partially  coherent  distributed 
(PCD)  concept  was  considered  in  [4].  In  [5],  perfor¬ 
mance  of  the  method  of  [1]  is  studied  through  the  CRB 
for  fully  coherent  and  incoherent  distributed  sources. 

The  problem  of  distributed  sources  has  been  inves¬ 
tigated  through  parametric  models  in  which  the  spa¬ 
tial  source  parameters  are  unknown.  These  parameters 
consist  of  parameters  of  interest  such  as  the  nominal 
DOA,  as  well  as  nuisance  parameters  such  as  angular 
spreading.  Therefore  the  resulting  algorithm  involves 
a  multi-dimensional  search  procedure  over  all  unknown 
parameters. 

In  this  paper  a  new  method  for  DOA  estimation  of 
scattered  sources  is  presented.  The  proposed  method 
decomposes  the  channel  vector  into  two  orthogonal  sub¬ 
spaces:  the  robust  and  non-robust  subspaces.  The  ro¬ 
bust  subspace  contains  the  part  of  the  channel  vector 
which  is  insensitive  to  the  nuisance  parameters.  The 
estimator  consists  of  two  main  stages:  The  first  is  iden¬ 
tification  of  the  robust  and  non-robnst  subspaces  of  the 
channel  vector.  In  the  second  stage,  this  decomposition 
is  used  to  define  a  new  data  model  on  which  the  ML  es¬ 
timator  is  developed.  The  resulting  estimator  involves 
a  search  only  on  the  parameters  of  interest  while  it  nulls 
the  subspace  which  is  sensitive  to  errors  in  the  nuisance 
parameters. 

2.  PROBLEM  FORMULATION 

Consider  an  array  of  N  sensors,  monitoring  a  wave-field 
generated  by  a  spatially  distributed  narrowband  source 
in  additive  background  noise.  The  complex  envelope 
representation  of  the  array  output  observation  vector 
at  the  discrete  time  tk  can  thus  be  modeled  as 

y(tfc)  =b(0,tk,i>)s(tk)  +n(4)  ,  k  =  1,  — ,  K,  (1) 
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where  b (9,tk,ip)  is  the  channel  vector  formed  between 
the  source  and  the  array  elements,  s(tk)  is  the  trans¬ 
mitted  signal  and  n (tk)  is  the  additive  noise  vector.  K 
is  the  number  of  available  independent  snapshots.  The 
source  signal  and  the  additive  noise  are  assumed  to  be 
independent,  zero-mean,  Gaussian  random  processes. 
The  noise  is  modeled  as  spatially  white: 

E{n(tk)nH(tk)}  =  all . 

The  unknown  parameter,  0  is  the  mean  direction  and 
ip  represents  the  spatial  source  parameters,  such  as  the 
source  spreading.  In  most  problems,  the  mean  direc¬ 
tion,  0,  is  the  parameter  of  interest  while  ip  consists  of 
the  unknown  nuisance  parameters.  For  a  distributed 
source,  the  channel  vector  is  modeled  as  a  random  vec¬ 
tor  such  that: 


b (9,tk,ip)  =  j  f{(p,tk\9,ip)a((p)d<p  (2) 

J  —  IT 


where  f(<p,  tk\9,  ip)  is  a  complex  random  spatio-temporal 
weighting  function  which  represents  the  local  scatter¬ 
ing.  This  model  was  proposed  and  used  in  [1],  [4], 
However,  under  this  model,  the  estimator  involves  a 
multi-dimensional  search  over  the  vector  of  unknown 
parameters,  ip. 

Our  goal  here  is  to  estimate  the  mean  direction  of 
the  source,  0,  from  the  data  {y(t/c)}f=l  in  the  presence 
of  unknown  nuisance  parameters,  such  as  the  spatial 
source  parameters,  ip,  and  the  signal  variance. 


3.  THE  PROPOSED  ESTIMATOR 

In  this  section,  an  estimator  of  the  nominal  source  lo¬ 
cation  which  is  robust  to  spatial  source  distribution,  is 
presented.  It  consists  of  two  main  stages:  The  first  is 
identification  of  the  robust  and  non-robust  subspaces 
of  the  data.  In  the  second  stage,  this  decomposition 
is  used  to  derive  a  model  for  which  the  ML  estimator 
is  developed.  The  resulting  estimator  involves  a  search 
procedure  over  the  parameter  of  interest  only,  while  it 
projects  the  received  data  into  the  subspace  which  is 
sensitive  to  uncertainties  in  the  nuisance  parameters. 

3.1.  Identification  of  the  robust  and  non-robust 
subspaces 

Assuming  that  the  nuisance  vector  parameter,  ip,  is 
unknown  random,  the  channel  vector  b(0, tk,  ip)  can  be 
decomposed  as  follows: 

b(0,tk,iP)  =  b(6)  +  Ab(9,tk,iP)  (3) 

where  6(0)  denotes  the  mean  of  b(0,  tk,ip)  with  respect 
to  the  random  parameter,  ip:  b(0)  =  (b {9,tk,ip)), 
where  E^,  denotes  expectation  with  respect  to  the  ran¬ 
dom  parameters  ip.  The  term,  Ab (9,tk,ip)  expresses 


deviation  of  the  channel  vector  from  its  average.  By 
this  notation,  we  assumed  that  the  time-varying  chan¬ 
nel  is  a  stationary  process  which  may  be  a  result  of 
channel  fluctuations,  and  therefore  the  statistics  of  the 
channel  does  not  depend  on  tk . 

The  average  channel  vector  6(0)  can  be  evaluated 
off-line  using  Monte-Carlo  method  for  a  given  grid  of  9 
according  to  the  statistics  of  ip. 

Let  Cb(0)  denote  the  covariance  of  Ab (9,tk,ip): 

Cb(0)  =  (Ab(0,  tk,  ip)  Ab(0,  tk,ip)H)  . 

Decomposition  of  the  robust  and  non-robust  subspaces 
is  performed  according  to  the  covariance  matrix  Cb(0). 
Given  sufficiently  large  number  of  array  elements  and 
small  spread  of  the  source,  the  matrix  Cb(0)  is  low 
rank:  rank(Cb(9))  =  Na  <  N,  where  N0  can  be  deter¬ 
mined  by  a  rank  test  of  the  covariance  matrix.  Thus, 
the  matrix  of  its  eigenvectors,  H(0),  can  be  decom¬ 
posed  as: 

H(0)  =  [Hj  (0)  H2(0)]  (4) 

where  the  NxNa  matrix,  Hi(0),  denotes  its  princi¬ 
pal  subspace.  Therefore  the  channel  vector  deviation, 
Ab(0,  ip),  can  be  approximated  by  the  following  repre¬ 
sentation: 


Ab(9,tk,iP)^H1(9)f3(9,tk,iP)  (5) 

The  matrix  H]  (9)  represents  the  most  sensitive  sub¬ 
space  of  the  channel  vector  to  the  source  spreading. 
Now,  the  channel  vector  can  be  represented  as 


b  (9,tk,iP)  «  b{9)  +  nx{9)0{9,tk,iP) 

1 


=  [6(0)  H,  (0)] 


(3{9,tk,ip) 


B{e) 


(6) 


7(0, tk, -4’) 


In  order  to  obtain  estimators  which  avoid  a  search 
procedure  over  ip,  in  the  following  the  dependence  of 
7(’>  •> ')  °n  ip  and  0  is  ignored  and  7 (0,  tk,  ip)  is  assumed 
to  be  an  unknown  vector.  The  importance  of  the  above 
decomposition  is  that  it  isolates  a  subspace  which  does 
not  depend  on  the  nuisance  parameters,  ip. 


3.2.  The  Maximum-Likelihood  estimator 


By  substitution  of  the  channel  vector  model  from  (6) 
into  (1)  the  data  model  becomes 

y(4)  =  B(0)7(ffc)s(ffc)  +  n (tk)  =  B(9)g{tk)  +  n (tk)  , 

(7) 

where  g (4)  =  7 (tk)s(tk). 

For  the  case  of  coherently  distributed  sources,  with 
no  channel  fluctuations,  7 (tk)  is  time-independent,  and 
therefore  g(tk)  is  a  zero-mean  random  vector  whose 
covariance  matrix  is  of  rank  one: 

Rg  =  E{g(tk)gH(tk)}  =  a2gvvH  ,  (8) 
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where  v  is  an  unknown,  deterministic  vector.  Under 
the  assumption  of  independent,  zero-mean,  Gaussian 
signal  and  noise,  the  conditional  probability  density 
function  (pdf)  of  the  data  is  zero-mean,  Gaussian,  with 
covariance: 

Ry  =  E{y(tk)yH(tk)}  =  B(0)RgB* (9)  +  o*I .  (9) 
Substituting  (8)  into  (9)  gives: 

Ry  =  a\  (B(0)vvffB H{6)snr  + 1)  .  (10) 

A  (T2 

where  the  signal-to-noise  ratio  is  defined  as  snr  = 

Now,  the  ML  estimator  of  6  can  be  written  as 

6ml  =  argmax  max  f(y(h),  ■  ■  ■  ,y(Uf)|v,snr,  <r2) 

0  v,snr,cr2 

(it) 

where  v,  snr,  a2  are  nuisance  parameters  and  the  func¬ 
tion  f(y(ti),  •  •  • ,  y(tK)\v,  snr,  a2)  is  the  joint  pdf  of  the 
data  given  the  unknown  parameters.  In  the  Appendix, 
it  is  shown  that  by  taking  the  logarithm  of  (reflikeli- 
hood),  and  maximizing  over  the  nuisance  parameters, 
the  ML  estimator  of  0  becomes: 

0ml  =  argmax  max  (Ai(d)  -  log  \i(6))  (12) 

6  i 

where  A  i(9)  are  the  eigenvalues  of  the  matrix  Bff(0)SB(0) 
and  S  is  the  sample  covariance  matrix: 

s  =  ■ 

n  k= l 

This  estimator  ignores  some  prior  statistical  infor¬ 
mation  on  the  variance  of  the  elements  of  j3  that  may  be 
available  from  the  first  stage:  note  that  cov{f3{0,  tk,^)) 
is  the  matrix  of  eigenvalues  of  Cb(d).  With  the  mod¬ 
ified  model  of  (7),  the  dependence  on  the  uncertain¬ 
ties  in  the  nuisance  parameters  are  expressed  linearly. 
Therefore,  these  uncertainties  can  be  considered  as  an 
additive  noise  on  which  some  prior  statistical  informa¬ 
tion  may  be  available.  However,  this  additive  noise  is 
not  necessarily  Gaussian.  By  assumption  of  Gaussian- 
ity,  the  proposed  estimator  can  be  extended  to  maxi¬ 
mum  a-posteriori  probability  estimator  which  considers 
this  prior  statistical  information. 

4.  SIMULATION  RESULTS 

To  illustrate  the  results,  consider  a  distributed  source 
with  Gaussian  shape  spreading  with  angular  spread  of 
A.  Assume  an  equally  spaced  15  sensor  linear  array  of 
inter  sensor  separation  of  A/2  where  A  is  the  wavelength 
of  the  transmitting  source.  Further,  consider  the  case 
where  the  mean  DOA  is  30°  and  a2  —  1. 

Fig.  1  depicts  the  eigenvalues  of  the  matrix  Cb(0) 
for  different  values  of  6  where  the  spreading  parameter 


Figure  1:  Eigenvalues  of  the  channel  vector  covariance 
matrix,  Cb(0)  for  different  DOA’s,  0. 

was  set  to  3°.  It  shows  that  in  this  case,  the  chan¬ 
nel  covariance  matrix  is  low  rank,  that  is  the  source 
spreading  causes  perturbations  of  the  channel  vector 
in  a  limited  subspace.  Clearly,  this  subspace  is  larger 
for  greater  spreading  parameter.  This  figure  shows  that 
in  typical  cases  of  a  distributed  source,  there  exists  a 
large  subspace  which  is  insensitive  to  source  spreading. 
This  subspace  enables  source  localization  and  it  does 
not  require  estimation  of  the  spreading  parameters  or 
spreading  shape. 

Fig.  2  demonstrates  the  performance  of  the  robust 
ML  method  as  a  function  of  SNR  for  different  source 
spreading  parameter.  In  this  example  the  number  of 
snapshots  is  K  =  100.  As  it  was  shown  in  [4],  the 
estimation  error  does  not  converge  to  zero  as  the  SNR 
goes  to  infinity. 

Finally,  Fig.  3  shows  the  performance  of  the  pro¬ 
posed  estimator  as  a  function  of  the  spreading  param¬ 
eter  at  SNR’s  above  the  threshold  SNR.  As  expected, 
the  error  STD  is  an  increasing  function  of  the  spreading 
parameter. 

5.  CONCLUSIONS 

In  this  paper,  a  new  algorithm  for  scattered  source 
localization  is  presented.  The  algorithm  is  robust  to 
source  spreading  parameters  and  therefore  does  not  re¬ 
quire  jointly  estimating  those  nuisance  parameters  and 
the  DOA  which  is  usually  the  parameter  of  interest. 
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Figure  2:  Performance  of  the  proposed  estimator  as  a 
function  of  SNR.  for  different  spreading  levels. 


Therefore,  it  provides  a  computationally  efficient  tech¬ 
nique  for  scattered  source  localization  which  does  not 
involve  any  search  over  the  spreading  parameters.  The 
proposed  method  is  based  on  a  decomposition  of  the 
channel  vector  to  two  subspaces  and  parameter  estima¬ 
tion  according  to  the  subspace  which  is  not  sensitive  to 
the  source  spreading  parameters.  An  ML  estimator  for 
the  decomposed  model  is  presented  and  its  performance 
is  evaluated  by  Monte-Carlo  simulations. 


A.  APPENDIX 


Derivation  of  the  ML  estimator:  Eq.  (12). 

Taking  the  logarithm  of  the  conditional  pdf  of  (11), 
the  ML  estimator  of  0  is 


Oml  =  argmax  max  (-Alogdet(Ry) 

0  v,snr,<r£ 


K 


YJyH(tk)Ry1y(tk) 


fc= i 


From  (10)  it  can  easily  verified  that 

det(Ry)  =  o2nN  (1  +  snrvflrBflr(6>)B(6»)v) 

and 


snrB(0)vvHBff(0) 
1  +  snrv"BH(G)B{0) 


(A.l) 


(A.2) 


(A.3) 


By  substituting  (A.2)  and  (A.3)  into  (A.l),  one  obtains: 
irgmax  max  (-log(l  I 

0  v,«nr,(T^ 

1  snrv11  BH{0)  SB  (0)v 


0ml  =  argmax  max  (- log(l  +  snrvHBH(0)B(0)v) 
9  v,«nr,(T^ 


+ 


0-2  1  +  snrvffBw(0)B(0)v 


(A. 4) 


where  S  is  the  sample  covariance  matrix.  Maximizing 
(A. 4)  with  respect  to  snr  gives 

0ml  =  arg  max  max  (G(0,  v)  -  log(G(0,  v)))  (A. 5) 

0  v 


where  G(0,v)  =  ^7rg7^flfg^r •  With  no  loss  of  gener¬ 
ality  we  assume  that  I lvl Ib«(6>)b(6»)  =  vHBff(0)B(0)v  = 
1 .  Now,  it  is  required  to  solve  the  following  problem 

v"B»SB(0)v  — >  max 
with  respect  to  v,  subject  to  the  constraint: 

v"Bh(0)B(0)v  =  1  . 

This  maximization  can  be  performed  using  Lagrange 
multipliers.  Finally,  the  ML  estimator  can  be  written 
as: 


Figure  3:  Performance  of  the  proposed  estimator  as  a 
function  of  the  spreading  parameter. 


0ml  =  arg  max  max  (Aj  (0)  -  logA  ,(0))  (A. 6) 

where  A i(0)  are  the  eigenvalues  of  the  matrix  BH  (0)SB(0). 
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ABSTRACT 


The  method  of  sparse  spectrum  estimation  de¬ 
veloped  by  Chen  and  Donoho  for  real-valued 
one-dimensional  signals  [2]  has  been  extended 
to  complex- valued  signals  [12],  and  is  used  here 
in  two  widely  different  applications:  to  denoise 
and  superresolve  ISAR  images,  and  to  trans¬ 
form  extended  X-ray  absorption  fine-structure 
(EXAFS)  data  of  the  elements  to  aid  in  the  de¬ 
termination  of  their  detailed  crystal  structure. 
This  extension  of  the  Chen-Donoho  algorithm, 
which  we  call  the  /1-FFT,  incorporates  the  a 
priori  information  that  the  spectrum  is  sparse 
by  minimizing  the  l 1  norm  of  the  coefficients  of 
the  expansion  functions.  The  d-FFT  is  applied 
to  stepped-frequency  ISAR  imaging  where  it 
increases  resolution  by  factors  of  4  and  64  over 
that  of  the  windowed  Fourier  transform,  for  the 
real  and  simulated  data  presented  here.  In  the 
second  application,  to  determine  the  effects  of 
aging  on  the  crystal  structure  of  plutonium, 
the  fx-FFT  is  used  to  transform  EXAFS  pluto¬ 
nium  data.  The  fx-FFT  increases  inter-atomic 
spatial  resolution  by  a  factor  of  64  over  that 
delivered  by  a  windowed  Fourier  transform. 


Brendt  Wohlberg 

Theory  Division  and 
Center  for  Nonlinear  Studies 
Los  Alamos  National  Laboratory 
Los  Alamos,  NM  87545 

1.  INTRODUCTION  TO  THE  ZX-FFT 

Recently  Chen  and  Donoho  presented  a  novel 
method  of  super-resolution  spectrum  estima¬ 
tion  for  real- valued  one-dimensional  signals  [2], 
They  write  the  signal  as  an  overcomplete  linear 
combination  of  sinusoids  with  many  more  fre¬ 
quencies  than  points  in  the  signal.  To  limit  the 
number  of  possible  expansions,  they  choose  the 
coefficients  of  the  sinusoids  in  the  expansion  so 
that  the  sum  of  their  absolute  values  is  as  small 
as  possible,  subject  to  the  constraint  that  the 
sum  of  sinusoids  adds  up  to  the  signal  at  its 
sampled  values.  Technically,  the  l 1  norm  of 
the  expansion  is  minimized.  Minimizing  the  l 1 
norm  favors  expansions  with  fewer  large  terms 
over  many  small  terms.  The  Method  of  Frames, 
which  minimizes  the  l 2  norm,  does  just  the  op¬ 
posite  [5].  Even  if  the  function  being  expanded 
is  just  a  single  expansion  function,  every  other 
expansion  function  generally  has  a  nonzero  in¬ 
ner  product  with  it,  i.e.,  a  nonzero  coefficient 
in  its  Method  of  Frames  expansion. 

Minimizing  the  lp  length  for  any  p  in  the 
range  0  <  p  <  1  favors  a  sparse  representa¬ 
tion  of  the  signal,  but  if  the  p  —  1  norm  is 
chosen,  a  global  minimum  for  the  expansion 
coefficients  can  be  found  efficiently  using  re- 
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cently  developed  fast  linear  programming  al¬ 
gorithms  [3].  Because  linear  programming  op¬ 
timizes  globally,  it  can  stably  superresolve  in 
ways  that  Matching  Pursuits  [7,  9]  cannot. 

The  1°  case,  with  its  local  minima  and  bounds 
on  the  deviations  from  the  global  minimum,  is 
discussed  by  Natarajan  [8].  Other  methods  for 
obtaining  sparse  representations  are  described 
in  references  [6]  and  [10]. 

Denoising  by  relaxing  constraints:  If  the 
signal  has  noise  added  to  it,  the  resulting  noisy 
signal  should  not  be  represented  exactly  as  a 
sparse  sum  of  sinusoids.  In  the  simplest  method 
for  denoising,  the  requirement  that  the  weighted 
sinusoids  sum  exactly  to  the  noisy  signal  is  re¬ 
laxed  [1].  The  deviation  allowed  is  set  by  the 
signal-to-noise  ratio. 

Denoising  by  including  delta  functions  as 
expansion  functions:  In  a  second  method  of  de¬ 
noising,  delta  functions  situated  at  every  sam¬ 
pled  point  are  added  to  the  expansion  set  of 
sinusoids.  The  overall  l 1  norm  of  the  expan¬ 
sion  containing  both  sinusoids  and  delta  func¬ 
tions  is  minimized.  The  signal  is  estimated  by 
summing  only  the  sinusoids,  the  noise  by  sum¬ 
ming  only  the  delta  functions  [1].  The  relative 
amplitudes  of  the  sinusoids  and  delta  functions 
in  the  expansion  set  depends  on  the  signal-to- 
noise  ratio. 

In  the  next  sections  the  Chen-Donoho  algo¬ 
rithm  for  signals  with  nonnegative  coefficients 
is  outlined,  the  P-FFT  for  real  and  complex 
signals  is  referenced,  and  the  P-FFT  is  ap¬ 
plied  to  ISAR  imaging  and  the  determination 
of  crystal  structure  from  EXAFS  data. 

1.1.  Sparse  representations  with  P-norm 
minimization 

In  order  to  describe  the  Chen-Donoho  method 
of  sparse  representation,  consider  the  problem 
of  expanding  a  sampled  signal  x[k]  into  a  linear 
combination  of  expansion  functions  wn[k],  with 


nonnegative  expansion  coefficients  x[n], 

N- 1 

x[k]  =  ^2  x[n]wn[k],  k  =  0, 1, — 

n=°  (1) 
Assume  that  there  are  more  expansion  func¬ 
tions  than  points  in  the  signal,  i.e.,  N  >  K. 
Then  the  expansion  for  x  is  not  unique  and 
the  coefficients  x[n]  are  underdetermined.  Fix 
them  by  minimizing  their  l 1  norm,  i.e., 

N- 1 

minimize  ||x||i  =  ^  |®[n]|.  (2) 

n=0 

This  leads  to  a  sparse  representation  of  x,  i.e., 
the  fraction  of  coefficients  x[n]  that  are  large 
will  be  small  relative  to  the  fraction  that  are 
large  if  the  l2  norm  were  minimized  (the  Method 
of  Frames  [5]). 

In  matrix  form,  find  a  vector  x  that  will 
minimize  ||x||i,  subject  to  wx  =  x,  (3) 
where 

x  >  0.  (4) 

The  matrix  element  w^n  is  the  fcth  sample  of 
the  nth  expansion  function.  This  is  a  linear 
programming  problem.  A  method  for  solving 
Eq.  3  when  x  can  be  either  positive  or  nega¬ 
tive  is  given  in  reference  [1],  and  the  complex 
coefficient  case  is  described  in  [12]. 

2.  ISAR  IMAGING 

The  application  of  the  ^-FFT  to  ISAR  imaging 
is  given  in  Figures  1  and  2,  where  simulated 
Mig  25  data  and  real  Boeing  727  data  are  used 
to  form  superresolution  ISAR  images. 

3.  EXAFS  CRYSTAL  STRUCTURE 

An  understanding  of  how  the  crtstal  structure 
of  plutonium  changes  while  aging  may  be  help¬ 
ful  in  establishing  the  functionality  and  safety 
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Amp{  727_real_data_4x_resolution_32_columns  } 


Figure  1:  Top:  ISAR  image  from  simulated 
Mig  25  data  using  the  ^-FFT.  Each  row  of  the 
image  was  constructed  from  32  data  points.  A 
total  of  512  points  were  available,  but  since  the 
plane  was  accelerating  only  1  /16th  of  the  data 
was  used  to  create  an  “instantaneous”  snap¬ 
shot.  The  resolution  is  64  times  greater  than 
that  provided  by  a  Fourier  transform.  No  delta 
functions  were  used  in  the  fit  because  no  noise 
was  added  in  the  simulation.  Hann-windowed 
Fourier  transforms  were  first  used  to  transform 
the  data  column-wise.  Bottom:  The  same  data 
used  to  create  the  top  image  was  analyzed  row¬ 
wise  with  Hann-windowed  Fourier  transforms. 
Note  the  difference  of  a  factor  of  64  in  the  hori¬ 
zontal  resolution.  The  vertical  resolution  in  the 
top  and  bottom  images  is  identical  because  the 
same  column-wise  processing  was  used  for  both 
images. 


Figure  2:  Top:  ISAR  image  from  real  727  data 
using  the  T-FFT.  Each  row  of  the  image  was 
constructed  from  32  data  points.  The  resolu¬ 
tion  is  4  times  greater  than  that  provided  by  a 
Fourier  transform.  Greater  resolution  was  ob¬ 
tained  in  other  ISAR  images  formed  from  the 
same  data,  but  the  increased  resolution  cre¬ 
ated  “inferior”  images  for  viewing  because  the 
resulting  line  segments  from  which  the  images 
were  constructed  were  too  thin  to  have  signifi¬ 
cant  visual  impact.  Delta  functions  were  used 
in  the  fit  to  remove  essentially  all  noise.  Hann- 
windowed  Fourier  transforms  were  first  used  to 
transform  the  data  column-wise.  Bottom:  The 
same  data  used  to  create  the  top  image  was  an¬ 
alyzed  row-wise  with  Hann-windowed  Fourier 
transforms.  Note  the  difference  of  a  factor  of 
4  in  the  horizontal  resolution  and  the  great  re¬ 
duction  in  noise  in  the  top  image  that  results 
from  including  delta  functions  in  the  expan¬ 
sion  set.  The  vertical  resolution  in  the  top  and 
bottom  images  is  identical  because  the  same 
column-wise  processing  was  used  for  both  im¬ 
ages. 
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of  stockpiled  nuclear  weapons.  The  fine  struc¬ 
ture  observed  in  the  energy  dependence  of  the 
total  absorption  cross  section  of  X-rays  in  plu¬ 
tonium  can  be  used  to  determine  the  relative 
distances  between  neighboring  atoms  in  pluto¬ 
nium,  since  the  fine  structure  reflects  the  in¬ 
terference  of  electrons  directly  ionized  by  the 
X-rays  with  those  undergoing  additional  scat¬ 
tering  off  neighboring  atoms  after  initial  ion¬ 
ization  [11]. 

The  X-ray  absorption  interference  term  I(k) 
at  photon  momentum  A;  is  a  sum  over  all  paths 
taken  by  the  ionized  electron.  In  the  single¬ 
scattering  approximation,  when  the  ionized  elec¬ 
tron  is  approximated  by  a  plane  wave,  I  ( k )  be¬ 
comes 


j(£)  =  ^2  (^)  c~2(r,  /A(fc)+<rffc2)ci[2fcrJ  +<j>,  (fc)]  ^ 

sites  s 

(5) 

where  As(k)  and  (f)s(k)  are  the  scattering  am¬ 
plitude  and  phase  of  the  outgoing  electron  scat¬ 
tering  off  the  sth  site  at  distance  rs  from  the 
point  of  ionization,  A (k)  is  the  electron  mean 
free  path,  and  o\  is  the  Debye- Waller  factor  re¬ 
sulting  from  phonon  motion.  The  magnitude 
of  the  electron  mean  free  path  A  (k)  restricts  the 
number  of  important  scattering  sites  to  those 
in  the  neighborhood  of  the  ionized  atom. 

Theory  provides  estimates  of  all  quantities 
except  the  inter-site  distances  [11],  which  can 
ultimately  be  found  with  the  P-FFT  [13]. 

Figure  3  compares  the  windowed  FFT  am¬ 
plitude  and  the  l1 -FFT  (LIFT)  amplitude  of 
newly-processed  plutonium  EXAFS  data.  The 
top  panel  shows  the  same  two  curves  as  the 
bottom  panel,  but  with  an  expanded  ordinate. 
The  spatial  resolution  provided  by  the  l1- FFT 
is  64  times  higher  than  that  of  the  windowed 
FFT.  All  peaks  in  the  P-FFT  amplitude,  ex¬ 
cept  for  the  peak  at  3.9  A,  correspond  to  the 
positions  of  atoms  in  6-plutonium  (face-centered 
cubic  lattice).  Evidence  for  an  extra  (anoma¬ 
lous)  site  in  newly-processed  plutonium  has  been 


reported  by  Conradson  using  a  windowed  FFT  [4] 
His  extra  peak  was  located  at  3.7  A,  but  with 
poorer  spatial  resolution.  Although  the  differ¬ 
ence  between  3.9  and  3.7  A  in  site-position  is 
numerically  small,  theoretical  models  that  try 
to  explain  the  existence  of  this  extra  peak  are 
very  sensitive  to  its  exact  location.  Therefore 
an  accurate  experimental  determination  of  this 
site-position  is  important. 
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top  panel  expands  the  ordinate  of  the  lower 
panel.  The  peaks  correspond  to  the  positions 
of  plutonium  atoms.  A  face-centered  cubic  lat¬ 
tice  structure  predicts  the  positions  of  all  the 
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The  T-FFT  increases  spatial  resolution  by  a 
factor  of  64. 
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ABSTRACT 

Micro-Doppler  induced  by  mechanical  vibrating  or  rotating  of 
structures  in  a  radar  target  is  potentially  useful  for  target 
detection,  classification  and  recognition.  While  the  Doppler 
frequency  induced  by  the  target  body  is  constant,  the  micro- 
Doppler  due  to  vibrating  or  rotating  structures  of  the  target  is  a 
function  of  dwell  time.  Analysis  of  the  time-varying  Doppler 
signature  in  the  joint  time-frequency  domain  can  provide  useful 
information  for  target  detection,  classification  and  recognition. 

INTRODUCTION 

Mechanical  vibration  or  rotation  of  structures  in  a  target  may 
induce  frequency  modulation  on  returned  signals  and  generate 
side-bands  about  the  center  frequency  of  the  target’s  body 
Doppler  frequency.  The  modulation  due  to  vibrations,  which  is 
usually  at  very  low  frequencies  relative  to  the  body  Doppler 
frequency,  is  called  micro-Doppler  phenomenon.  The 
modulation  induced  by  rotations,  which  can  be  seen  as  a  special 
case  of  vibrations  and  may  have  higher  frequencies  relative  to 
the  body  Doppler  frequency,  may  also  be  called  the  micro- 
Doppler.  The  micro-Doppler  phenomenon  can  be  regarded  as  a 
signature  of  the  interaction  between  the  vibrating  or  rotating 
structures  and  the  body  of  the  target  and  provides  an  additional 
information  for  target  recognition  complementary  to  existing 
recognition  methods. 

In  coherent  radar,  the  phase  of  the  returned  signal  from  a  target  is 
sensitive  to  variation  in  range.  A  half  wavelength’s  change  can 
cause  360°  phase  change.  It  is  conceivable  that  the  vibration  of 
a  reflecting  surface  may  be  measured  with  the  phase  change. 
Thus,  the  Doppler  frequency  shift,  that  represents  the  change  of 
phase  function  with  time,  can  be  used  to  detect  vibrations  or 
rotations  of  structures  in  a  target. 

Figure  1  illustrates  a  reflector  illuminated  by  radar  located  at  the 
origin  of  a  (x,y,z)  coordinate  system.  The  reflector  P  is  vibrating 
about  a  center  point  Q  at  a  distance  Rq  from  the  radar.  If  the 
azimuth  and  elevation  angle  of  the  point  Q  relative  to  the  radar  is 
a  and  p,  respectively,  the  point  Q  is  at 
( R0  cos  p  cos  a,  R0  cos  /?  sin  a,  R0  sin  /?)  in  the  (x>y>z) 
coordinates.  Assume  that  the  reflector  is  at  a  distance  Dt  from 
the  point  Q  that  is  also  the  origin  of  a  coordinates  (x\y\z  j 
translated  from  (x,y,z).  If  the  azimuth  and  elevation  angles  of  the 
reflector  P  relative  to  the  center  point  Q  is  arp  and  /?p, 
respectively,  the  reflector  will  be  at 
(D,cosfipCosap,Dtcosfipsmap,Dtsmfip)^  the  (x  ,y  ,z*) 
coordinates.  Therefore,  the  vector  from  the  radar  to  the  reflector 


becomes  f,  =  Rq  +  D,as  shown  in  the  Fig.l.  Generally,  the  range 
from  the  radar  to  the  reflector  can  be  expressed  as 

rt  =|  rt  |=  [(/Jq  cos /}cosa  +  DtcosfiP  cosap)2 
+  (Rq  cos  /?  sin  a  +  D,  cos  Pp  sin  op  )2 
+  ( Rq  sin p+  Dt  sin Pp)2]'/2 

In  the  case  that  the  azimuth  angle  a  of  the  point  Q  and  the 
elevation  angle  fip  of  the  reflector  P  are  all  zero,  we  have 
r,  =  (Rq  +  D}  +  2RqD,  cos p cosaP)U2  £  Rq  +  D,  cos/?  cos 
for  Rq»  D,  •  If  the  vibration  rate  of  the  reflector  is  wv  and  the 
amplitude  of  the  vibration  is  Dv,  the  range  of  the  reflector 
becomes 

r(t )  =  rt  =  R0  +  Z>„  sin  co  v  /  cos  p  cos  a 
Thus,  the  received  radar  signal  becomes 

s(l)  =  pexp{j[2n  fct  +  4/r--/)]}  =  p  exp{j[2;r/e/  +  <p{t)]} 

A 

where  j,(t)  =  4„r(t)  /  A is  the  phase  function. 

Because  the  time-derivative  of  the  phase  is  frequency,  by  taking 
the  time-derivative  of  the  phase,  the  micro-Doppler  frequency 
induced  by  the  vibration  is 

4  jr 

fD  =  —  £>,,  a>„  cos  p  cos  a,,  cos  cov  t 

A 

The  maximum  of  the  Doppler  frequency  change  is  (4tt/  lc)Dr(or 
that  can  be  reached  when  the  orientation  of  the  vibrating 
reflector  is  along  the  projection  of  the  radar  line-of-sight 
direction,  i.e.,  aP  =  0 ,  and  the  elevation  angle  /?  of  the  reflector 
is  also  0. 


Figure  1.  Geometry  of  a  radar  and  a  vibrating  reflector. 
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1.  FREQUENCY  DOMAIN  SIGNATURE 

By  taking  the  Fourier  transform  of  the  radar  returned  signal,  the 
micro-Doppler  frequency  shift  may  be  observed  in  the  frequency 
domain  [1].  Fig.2(a)  illustrates  a  radar  and  two  comer  reflectors 
separated  in  a  distance  of  13.5m  where  one  is  stationary  and  the 
other  is  vibrating  at  1.5Hz  with  a  displacement  of  3cm.  Fig.2(b) 
is  the  returned  I  and  Q  signals  from  the  two  reflectors.  The 
range  profile  is  obtained  by  taking  the  Fourier  transform  of  the 
returned  I  and  Q  signals  that  is  shown  in  Fig.2(c).  The  spread 
peak  in  the  range  profile  indicates  that  there  may  be  a  vibrating 
reflector  in  that  region. 

(») 


Radar 


range 


well.  The  time-frequency  signature  is  obtained  by  taking  time- 
frequency  transforms,  such  as  the  Gabor  transform  [2],  From  the 
time-frequency  signature  of  the  vibration  we  can  estimate  the 
vibration  rate  and,  also,  re-focus  the  vibrating  reflector  by  taking 
time  samples. 

0>) 

H—f«  praAfe 


Figure  4.  Time-frequency  domain  analysis  of  the  returned 
signal. 


2.2  ROTATION-INDUCED  MICRO-DOPPLER 


Figure  2.  An  experimental  radar  data  (b)  return  from  two  comer 
reflectors  (a):  one  is  stationary  and  the  other  is  vibrating.  The 
Fourier  transform  of  the  returned  signal  is  the  range  profile  (c). 


2.  TIME-FREQUENCY  DOMAIN 
SIGNATURE 


2.1  VIBRATION-INDUCED  MICRO-DOPPLER 

As  shown  in  Fig.3,  when  radar  is  operating  at  X-band  with 
0.03m  wavelength,  a  vibration  at  10Hz  with  a  displacement  of 
0.1cm  will  induce  a  maximal  micro-Doppler  frequency  shift  of 
4.2Hz,  which  is  detectable  with  a  high-resolution  radar. 

Micro-Doppler  by  vibration 


fD(t)  =  ~D0O>COSO)t 

(A  =  o) 

Q)  =  \  0Hz,30Hz,50Hz 
D0  =  0.1cm 
X  =  0.03m 


O  0.02  0.01  0.06  0.00  01 

Figure  3.  Micro-Doppler  generated  by  a  vibrating  reflector. 


Fig.4  shows  the  time-frequency  domain  signature  of  the  X-band 
experimental  radar  data  returned  from  the  two  reflectors 
mentioned  earlier.  Fig. 4(a)  is  the  magnitude  of  the  radar 
received  I  &  Q  data  ,  and  4(b)  shows  the  time-frequency 
signature  of  the  data,  where  the  vibration  can  be  observed  very 


Rotating  parts,  such  as  helicopter’s  rotor  blades  and  rotating 
antennas,  are  in  rotational  motion  that  will  impart  a  periodic 
modulation  on  the  returned  signals  from  the  rotating  structures. 
The  periodic  modulation  can  generate  a  radar  signature  that  can 
be  used  for  target  identification. 

Helicopter’s  rotor  blade  can  be  modeled  as  a  rigid,  homogeneous 
linear  antenna  [3],  The  electromagnetic  backscattering  signal 
from  a  point  on  the  antenna  has  a  Doppler  frequency  shift  from 
the  Doppler  frequency  of  the  rotor  center. 


Figure  5.  Geometry  of  a  radar  and  a  rotating  reflector. 

Let  us  begin  with  a  simple  case  as  shown  in  Fig.5  where  a 
scatterer  P  from  one  rotor  blade  rotates  about  a  center  point  Q 
with  a  rotation  rate  of  fl.  The  distance  from  the  scatterer  to  the 
center  point  is  do,  and  the  distance  between  the  radar  and  the 
center  point  is  Rq.  If  both  the  radar  and  the  rotor  are  on  the 
same  2-D  plane,  i.e.  the  elevation  angle  is  zero,  the  range  from 
the  radar  to  the  scatterer  becomes 
r(t)  =  R0+vt  +  d0  sin  0O  cos  Clt  +  d0  cos  0„  sin  Clt 
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where  v  is  the  radial  velocity  of  the  helicopter  and  6q  is  the 
initial  rotation  angle  of  the  scatterer.  The  radar  received  signal 
becomes 

s(t)  =  p  exp{y[2jr  fct  +  ^-r(t)]}  =  p  exp{j[2nfct  +  (p(t )]} 

■ c 

where  4>(t)  =  4nr(t)/A,cis  the  phase  function.  In  this  case,  the 
Doppler  frequency  shift  of  the  scatterer  can  be  obtained  as 

fD  =  =  —  rfofi(-sin0o  sin  Qt  +  cos  0O  cos Qt) 

dt  Xc 

If  the  rotor  has  an  elevation  angle  /?,  then  the  above  equation  can 
be  modified  as 

fD  =  — -  d0Q  cos/?  (-sin  6a  sin  Qt  +  cos  0O  cos  Qt) 

K 

For  N-blade  rotor,  assuming  that  one  scatterer  represents  one 
blade,  there  are  total  N  scatterers  at  different  initial  rotation 
angles: 

ek  =  0O  +  kin  IN,(k  =  0,1,2 ,...N  - 1) 
and  the  total  received  signal  becomes 

s(t)  =  £>,.(0  =  X  A  exp{j[2nfct  +  <t>t(t))} 

k=0  *= 0 

N- 1 

=  expljlnfJ^Pt  exp (M(0> 

k=0 

where 

=  — ■  +  v/  +  cos  p(da  sin  8t  cos  Qt  +  da  cos0k  sinQ/)] 

K 

=  —  [^,  +  vt  +  cos  p  sin(C2f  +  0O  +  kin  /  AO]  (*  =  0,1,2,. ..AI  - 1) 

By  taking  the  Fourier  transform,  the  frequency  spectrum  of  the 
received  signal  can  be  expressed  as 

S(f)  =  C0S(f  -  fc)  +  £Ct [<?(/ - /.  - kNQ)  +  <?(/ - /„  +  kNQ)] 

where  Co  and  C*  are  determined  by  Xc,Ro,v,do,/},N,0o,  and 

Q  and  may  be  defined  as  Bessel  functions  [4],  The  first  term  is 
the  carrier  frequency  and  the  terms  in  the  summation  determine 
the  micro-Doppler  generated  from  the  rotor  blades.  Fig.6(a) 
demonstrates  a  radar  returned  signal  from  rotating  rotor  blades, 
and  6(b)  shows  the  frequency  spectrum  of  the  returned  signal 
where  we  can  see  the  micro-Doppler  frequencies  generated  from 
the  rotor  blades. 

Relumed  signit  from  rotor  blades  Spectrum  of  (he  returned  signtl 


Time  Frequency  (norm.) 

Figure  6.  (a)  Returned  signal  from  rotor  blades;  (b)  Spectrum  of 
the  returned  signal. 


There  are  many  publications  on  the  analysis  of  propeller 
modulation  (PM)  and  jet  engine  modulation  (JEM)  in  the 
frequency  domain  [3,5,6,7,8],  Fig.7  shows  the  time-frequency 
signature  of  the  returned  signal  from  rotor  blades,  where  the 
characteristics  of  the  rotating  blades  can  be  seen  more  clearly  in 
the  joint  time-frequency  domain.  The  strong  time-frequency 
coefficients  along  the  horizontal  line  about  the  center  frequency 
are  due  to  the  returns  from  the  helicopter’s  body.  After  we 
suppress  the  time-frequency  coefficients  of  the  helicopter  body, 
the  strong  time-frequency  coefficients  along  the  dot-slope-lines 
are  due  to  the  returns  from  the  rotating  blades  as  shown  in  Fig.8. 
Because  time  information  is  available,  the  rotation  rate  of  the 
blades  can  be  measured  from  their  time-frequency  signature. 


Thnt 


Figure  7.  Time-frequency  signature  of  the  radar  returned  signal 
from  rotating  rotor  blades. 


2.3  WALKING  MAN  WITH  SWINGING  ARMS 


Figure  8.  Micro-Doppler  signature  of  a  walking  man  with 
swinging  arms. 


Fig.8  illustrates  a  man  walking  towards  a  radar  operating  at  X- 
band.  ISAR  image  of  the  walking  man  is  in  range  and  cross¬ 
range  domain.  The  hot  spot  is  the  body  of  the  walking  man.  If 
we  analyze  the  range  profile  at  the  range  cell  where  the  body  is 
located  in  using  a  time-frequency  transform,  we  can  see  the 
swinging  arms.  One  arm  has  a  Doppler  frequency  above  the 
body’s  Doppler  frequency  and  the  other  arm  has  a  Doppler 
frequency  below  the  body’s  Doppler.  The  upper-right  picture 
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shows  the  superposition  of  the  time-frequency  signatures  over 
several  range  profiles  when  the  man  is  walking.  We  can  see  the 
body’s  Doppler  frequency  is  almost  constant  and  the  arm’s 
micro-Doppler  becomes  time-varying  with  a  sinusoidal-like 
curve. 

2.4  ROTATING  ANTENNA 

Fig.9  shows  the  micro-Doppler  signature  of  a  rotating  antenna. 
The  real  part  and  the  imaginary  part  of  the  radar  return  from  the 
rotating  antenna  are  shown  in  the  figure.  The  time-frequency 
transform  of  the  radar  return  is  shown  on  the  right  where  the 
time-frequency  signature  is  unwrapped  in  the  frequency  domain. 
The  parallel  sloped  lines  are  the  micro-Doppler  signature  of  the 
rotating  antenna.  From  the  time  and  frequency  information,  the 
rotation  rate  of  the  antenna  can  be  calculated. 
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Figure  9.  Rotating  antenna  and  its  micro-Doppler  signature. 


3.  SUMMARY 

We  discussed  the  micro-Doppler  phenomenon  induced  by 
mechanical  vibrations  or  rotations  of  structures  in  a  radar  target, 
and  proposed  a  time-frequency  analysis  of  the  micro-Doppler. 
The  time-frequency  signature  of  the  micro-Doppler  provides 
additional  time  information  and  shows  micro-Doppler  frequency 
variations  with  time.  Thus,  an  additional  information  about 
vibration  rate  or  rotation  rate  is  available  for  target  recognition. 


ACKNOWLEDGMENTS 


This  work  was  sponsored  by  the  Office  of  Naval  Research.  We 
would  also  like  to  express  our  thanks  to  the  DREO,  Canada  for 
the  radar  data  of  vibrating  reflector,  and  the  Norden  Systems, 
Northrop  Grumman  for  the  radar  data  of  walking  man. 


466 


ESTIMATING  THE  PARAMETERS  OF  MULTIPLE  WIDEBAND 
CHIRP  SIGNALS  IN  SENSOR  ARRAYS 

Alex  B.  Gershman*  Marius  Pesavento*  Moeness  G.  Amin** 

*  Department  of  ECE,  McMaster  University 
Hamilton,  L8S  4K1  Ontario,  Canada 
gershman® ieee . org 

^Department  of  ECE,  Villanova  University, 

Villanova,  PA  19085,  USA 


ABSTRACT 

The  problem  of  estimating  the  parameters  of  multiple  wide¬ 
band  polynomial-phase  signal  sources  in  sensor  arrays  is 
addressed.  A  new  deterministic  maximum  likelihood  (ML) 
direction  of  arrival  (DOA)  estimator  and  the  respective 
Cramer-Rao  bound  (CRB)  are  presented  for  the  general 
case  of  multiple  constant-amplitude  polynomial-phase  sou¬ 
rces.  Since  the  proposed  ML  estimator  is  computationally 
intensive,  an  approximate  solution  is  proposed,  originating 
from  the  analysis  of  the  ML  function  in  the  single  chirp  case. 
As  a  result,  the  so-called  chirp  beamformer  is  derived,  which 
is  applicable  to  “well-separated”  sources  that  have  distinct 
time-frequency  or/and  spatial  signatures.  Our  beamform¬ 
ing  approach  requires  solving  a  3D  optimization  problem 
and,  therefore,  enjoys  essentially  simpler  implementation 
than  that  dictated  by  the  exact  ML. 

1.  INTRODUCTION 

Estimating  the  parameters  of  polynomial-phase  signals  is 
an  important  problem  because  linear  FM  (chirp)  and  non¬ 
linear  FM  signals  are  encountered  in  many  practical  appli¬ 
cations  [l]-[3].  Recently,  there  has  been  a  growing  interest 
in  estimating  the  parameters  of  multiple  polynomial-phase 
signals  in  sensor  arrays  [4]-[7j.  Several  authors  solved  this 
problem  using  narrowband  assumptions.  In  [5],  a  new  spa¬ 
tial  time-frequency  distribution  (STFD)  concept  has  been 
developed  and  employed  for  direction  finding  of  narrow- 
band  chirp  sources  using  subspace  techniques.  Several  exact 
and  approximate  ML  algorithms  for  this  estimation  prob¬ 
lem  have  been  proposed  [4].  Promising  extensions  of  the 
above-mentioned  narrowband  approaches  to  the  wideband 
polynomial-phase  signal  case  have  been  recently  reported 
[6]-[7].  However,  these  methods  still  suffer  from  quite  re¬ 
strictive  assumptions.  In  particular,  the  application  of  the 
wideband  STFD  approach  [7]  restricts  the  sliding  data  win¬ 
dow  length,  whereas  the  consideration  in  [6]  is  limited  by 
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the  assumption  of  linear  FM  signals  with  the  central  fre¬ 
quencies  which  are  known  and  identical  for  each  source. 

In  this  paper,  we  obtain  a  new  form  of  the  determinis¬ 
tic  ML  estimator  of  the  parameters  of  multiple  wideband 
constant-amplitude  polynomial-phase  signals  received  by  a 
sensor  array.  Our  technique  is  free  of  any  restrictions  on 
the  signal  waveform  parameters  and  the  length  of  the  ob¬ 
servation  interval.  Explicit  expressions  for  the  correspond¬ 
ing  CRB  on  the  accuracy  of  estimating  the  signal  DOA  and 
frequency  parameters  are  derived. 

Although  the  presented  ML  estimator  concentrates  the 
problem  at  hand  with  respect  to  the  signal  nuisance  param¬ 
eters,  its  computational  cost  may  be  still  very  high,  since  it 
involves  a  nonlinear  optimization  over  the  parameter  space 
of  a  high  dimension.  Therefore,  an  approximate  solution  is 
considered,  originating  from  the  analysis  of  the  ML  function 
in  the  single  chirp  case.  Using  this  approximation,  we  ob¬ 
tain  a  new  form  of  spatio-temporal  matched  filter  (hereafter 
referred  to  as  the  chirp  beamformer),  which  is  applicable 
to  the  wide  class  of  scenarios  with  “well-separated”  sources 
that  have  distinct  time-frequency  or/ and  spatial  signatures. 
Our  chirp  beamforming  approach  entails  solving  a  3D  opti¬ 
mization  problem  and,  therefore,  enjoys  essentially  simpler 
implementation  than  the  presented  exact  ML  technique. 

Simulation  results  illustrate  the  performance  of  the  es¬ 
timators  and  validate  our  CRB  analysis. 

2.  SIGNAL  MODEL 

Assume  that  L  wideband  constant-amplitude  polynomial- 
phase  signals  impinge  on  a  linear  array  of  M  omnidirec¬ 
tional  sensors.  The  vectors  of  array  outputs  obey  the  fol¬ 
lowing  model 

x(t)  =  A(t)s(t)  +  n(t),  f  =  0, 1, . . . ,  AT  —  1  (1) 

where  A(t)  is  the  Mxl  time-varying  direction  matrix,  s 
is  the  L  x  1  vector  of  wideband  nonstationary  source  wave¬ 
forms,  n(t)  is  the  M  x  1  vector  of  complex  circularly  Gaus¬ 
sian  zero-mean  white  sensor  noise,  and  N  is  the  number  of 
snapshots. 

The  1th  source  waveform  can  be  modeled  as 
si{t)  =  QieJ(“’'.o‘+“M‘2/2+-+«l,K-i‘K/^)  =  aig(ui,t)  (2) 


0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


467 


f  tk+l ) 

g(u>ht)  =  exp{j^u;l,k—  | 


a;  is  the  deterministic  complex  amplitude,  u)t<k  (l  =  1,2,..., 
L",  k  =  0,1, ...  ,K  —  1)  are  the  unknown  frequency  param¬ 
eters,  and 

K- 1 


'i  (t)  —  E  ui,ktk 


is  the  instantaneous  frequency  of  the  Zth  polynomial-phase 
waveform.  The  K  x  1  vector 

.  .  .  ,U>i,K-l]T  (5) 

contains  the  unknown  frequency  parameters  of  the  1th  sig¬ 
nal,  and  K  is  the  order  of  the  polynomial-phase  model. 
The  direction  matrix 

A(t)  =  [a(6ut),...,a(6L,t)} 
combines  the  time-varying  steering  vectors 
a(0t ,  t )  =  [1 ,  (i)/c)dl  ain , . . . ,  ("!  W/C)<*M- 1  Sin 0; ^ 

where  d;  is  the  spacing  between  the  first  and  the  (i  +  l)th 
array  sensors.  In  (6),  we  assume  that  the  instantaneous 
signal  frequencies  u>i(t)  (t  =  1, . . . ,  L)  do  not  change  during 
the  time  necessary  for  a  wave  to  travel  across  the  array 
aperture1.  Using  (2)-(6),  model  (1)  can  be  rewritten  as 

x(t)  =  A{0,u,  t)G(u>,t)a  +  n(t) 


where  the  ( LK  +  2 L)  x  1  vector  of  unknown  model  param¬ 
eters  is  defined  as 


©=[0W,aT]T 

Rewrite  (13)  as 

Cn(&)  =  aH  |  E  AH(fl,u>,f)A(0,t«j,<)|  a 
-aH  j]T]  A"(0,w,t)x(t)  j 

TV  —  1 

+  ^  ^  x  (t)x(t)  (1 

(=0 

The  minimization  of  Cn  over  a  yields 
pv-l  \-1cjV_! 

&  =  <  E  AH  (e,u,t)A(e,u},t)  i  l  E  AH(e,u,t)x(t) 


Substituting  this  expression  into  (14),  we  obtain  the  nega¬ 
tive  concentrated  LL  function 


-  A(0,io,t)a  +  n(t) 

(7) 

TV  — 

Cn(0,  u)  = 

1 

'/XH  (t)x(t)~ 

II 

E 

(8) 

t~c 

) 

f  TV  — 1 

(9) 

H 

a  =  [ai, . . .  ,at]T 

(10) 

1  *=o 
r  tv-1 

G(w,f)  =  diag{p(a>i,t), . . .  ,g(u)L,t)} 

(11) 

X  | 

A(0,u,t )  =  A(0,u,t)G(u,t) 

(12) 

1 

L  t= o 

-Jfx'tflMmlj 


Note  that  all  nuisance  parameters  (the  deterministic  source 
waveforms)  are  now  included  in  the  vector  a. 

3.  DETERMINISTIC  ML  ESTIMATOR 

In  this  section,  we  derive  the  ML  estimator  of  the  source 
DOA’s  and  frequency  parameters  based  on  the  assump¬ 
tion  of  deterministic  source  waveforms.  The  negative  log- 
likelihood  (LL)  function  is  given  by 

N- 1 

£„(©)  =  ^||x(t)-A(0,w,t)G(w,f)a||2 

t= 0 
TV  — 1 


=  E  ll*(*)  - -A(0,  w,t)a\ 


If  necessary,  this  assumption  can  be  easily  relaxed  by  incor¬ 
porating  into  (6)  the  explicit  expression  for  instantaneous  fre¬ 
quency  as  a  function  of  propagation  delay.  However,  in  most 
of  cases,  this  assumption  is  valid  because  the  propagation  time 
across  the  aperture  is  usually  much  smaller  than  the  sampling 
interval. 


Ignoring  the  constant  terms,  the  positive  concentrated  LL 
function  is  then  given  by 


CP(0,u)  =  1 E  xH(t)A(fl,q>,f)| 

{N-l 

E  AH  (0,u,t)A{0,u,t) 

t= 0 

x  |  y  ^H(0,u>,t)as(t)  j 

The  ML  estimator 

[0,  Cj]  =  arg  max  Cp(0,u) 

0.U3 


jointly  estimates  the  direction  and  the  frequency  parame¬ 
ters  0  and  u>,  respectively.  It  requires  a  highly  nonlinear 
optimization  of  the  function  (16)  over  these  variables. 
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4.  CRAMER- RAO  BOUND 

In  this  section,  we  present  closed-form  expressions  for  the 
exact  CRB  on  the  accuracy  of  estimating  the  signal  model 
parameters. 

Theorem  1:  Let  the  observations  (7)  satisfy  the  follow¬ 
ing  statistical  model: 

x(t)  ~  M  (A(0,co,t)a,o-2l)  (18) 

Then,  the  Fisher  Information  Matrix  (FIM)  is  given  by 


(19) 


where 


F00 

F0& 

F0Va 

F0vk- 1 

Fe& 

Fdtdt 

F6tv0 

Fdci>K-1 

F  = 

F0ua 

fZ 

6lv  o 

FU0Va 

1 

54 

*  • 
o 
* 

-*V-i 

fZ.  ' 

*  VqVk-i 

* 

i 

* 

X 

i 

u  —  -  [0,di, . . .  ,d,M-i] 

c 

C  =  [cos  01,  COS  02,  •  •  •  ,COS0lJT 

E  is  the  matrix  containing  ones  in  all  positions,  0  is  the 
vector  of  zeros, 


V  = 

vec  {flT} 

[ui,...yK- 

■f 

(20) 

OJl,0 

W2,0 

Wi,  0 

U>1,1 

U>2,1 

WL,  1 

(21) 

W2.K-1 

0JL,K- 1  . 

and  vec{-}  represents  the  so-called  vectorization  operator 
stacking  the  columns  of  a  matrix  to  form  a  column  vector. 
Proof:  See  [8]. 

It  is  important  to  stress  that  vector  (20)  contains  the 
same  signal  frequency  parameters  as  those  included  in  w. 
However,  these  parameters  Eire  ordered  in  a  different  way. 
To  clarify  the  difference  between  u  and  v,  note  that 


I  iV  —  1 

F0e  =  J^Rei  Y.  AHDH(0,u;,t)£)(0,w,t)A 

l  t=o 

{N- 1 

2  AHDH(6,u>,t)A(0,u>,t)Q 

t- 0 

l  t=0 

x  {Tfc(<)0  (A(0,w,f)A)}} 

F&dc  =  4  Bje[YtQHAa(e,u,t)A(0,u,t)Q 


4r4SqHAH(0,w,<) 

l  i=o 

X  {Tk(t)o(A(e,u,t) A)}} 

4Re|  S  ©r*  (*)} 

l  t- 0 

x  {Tm(f)  ©  (A(0,  w,  t) A)  }  } 


F&«*  -  0 


a 

b(p,w,t) 

D(0,u,t) 

Q 

Tk{t ) 


diag  {ai,  •  •  •  ,ol} 
[Re{a}T,  Im{a}T] 
D(0,u,t)G(v,t) 


u>  =  vec  {O}  (22) 

In  Theorem  1,  we  use  the  vector  v  rather  than  u>  for  the 
sake  of  mathematical  convenience,  since  the  CRB  deriva¬ 
tion  in  terms  of  v  leads  to  simpler  expressions  for  the  FIM 
subblocks. 

5.  CHIRP  BEAMFORMER 

The  associated  computational  cost  of  the  ML  estimator 
(16)-(17)  may  not  be  always  acceptable.  In  this  section,  we 
simplify  the  ML  estimator  by  deriving  the  so-called  chirp 
beamformer  which  requires  a  simpler  3D  search  instead  of 
global  optimization.  Assuming  the  single  source  case2,  we 
rewrite  the  LL  function  (16)  as 

JV-1  2 

Cp{9i,wi)  =  j-j,  xH (t)a(9i,h>i,t)  (23) 


a(0i,u>i,t)  =  g(wi,t)a(0i,u>i,t)  (24) 

and  the  property  aH  a  =  M  is  used.  Assuming  a  chirp 
signal,  we  have  u>i  =  [wi,o,wi,i]T  and,  hence,  there  are 
only  three  parameters  {0i,u>i,o,  Wi,i},  which  correspond  to 
the  DOA,  frequency,  and  the  chirp  rate,  respectively. 
Introducing  the  simplified  (subscript-free)  notation 

0  =  0i,  £=wi,o,  C  =  wi,i  (25) 

and  omitting  the  constant  factor  1/M,  we  can  rewrite  the 
right-hand  side  of  (23)  as  the  following  function: 


9a(0i,wi,t) 

0a(0L,wt,f) 

JV-l  2 

001 

00  L 

=  5>w(f)a(0,£,c,<) 

t=o 

(26) 

[IJI] 

-  *(171 


E  +  itcT) 


2  This  assumption  will  be  relaxed  later. 
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The  function  (26)  is  referred  to  as  the  chirp  beamformer3. 

The  parameters  of  interest  can  be  obtained  from  the 
main  maxima  of  (26)  by  means  of  a  3D  search  over  the 
variables  {<?,  £,  £}.  The  chirp  beamformer  (26)  can  be  easily 
applied  to  the  multiple  source  case  under  the  condition  that 
the  sources  are  “well-separated”  in  one  or  more  parameters 
in  (25).  This  property  follows  from  the  structure  of  (26), 
which  is  linear  with  respect  to  the  second-order  moments 
of  x.  Therefore,  as  in  the  case  of  the  conventional  beam- 
former  [9],  [10]  which  is  widely  used  in  narrowband  array 
processing,  the  chirp  beamformer  (26)  can  be  straightfor¬ 
wardly  extended  to  the  multiple  source  case. 

Interestingly,  the  chirp  beamformer  has  quite  a  differ¬ 
ent  structure  as  compared  to  the  conventional  beamformer. 
The  latter  function  is  given  by  [10] 

/cb($)  =  aH  (9)Ra(9) 

N- 1 

=  (27) 

t=o 

where 

1  iv_1 

jj  E*w**w  (28) 

t=o 

is  the  sample  covariance  matrix,  and  the  steering  vector 
does  not  depend  on  the  temporal  index  t.  Comparing  (26) 
and  (27),  we  maintain  that  the  conventional  beamformer 
represents  the  sum  of  the  squared  absolute  values  of  vec¬ 
tor  inner  products,  whereas  the  chirp  beamformer,  on  the 
other  hand,  is  determined  by  the  squared  absolute  value  of 
the  sum  of  inner  products.  This  essential  difference  be¬ 
tween  (26)  and  (27)  can  be  explained  by  the  fact  that  in 
the  chirp  signal  case,  the  signal  temporal  characteristics  are 
taken  into  account  by  means  of  the  parametric  time-domain 
polynomial-phase  model.  Obviously,  this  corresponds  to 
the  so-called  coherent  time-domain  processing ,  whereas  in 
the  conventional  narrowband  case  the  snapshots  x(t)  are 
assumed  to  be  independent  and,  therefore,  the  processing 
in  (27)  remains  incoherent  in  time-domain. 

An  interesting  relationship  between  the  chirp  beam- 
former  (26)  and  the  traditional  estimation  techniques  can 
be  obtained  for  the  conventional  harmonic  signal  case  (£  = 
0).  In  this  case,  we  have 

d(0, 4,  C,  t)  =  d (9,  4,  t )  =  ejit  a(9, 0  (29) 

where  the  vector  a(0,£)  is  the  conventional  steering  vector, 
which  coincides  to  that  in  (27).  Hence,  the  beamforming 
function  (26)  can  be  transformed  to 

Mi)  =  |*H(£M0,£)|2 

=  a>,£)X(£)*H(£)o(0,£)  (30) 

where 

(31) 

v  t-o 

3We  use  this  term  because  of  the  obvious  analogy  with  the 
narrowband  conventional  beamformer  [10]  which  can  be  easily 
derived  from  the  conventional  deterministic  ML  estimator  under 
the  single-source  assumption  [11]. 


Figure  1:  Comparison  of  the  DOA  estimation  RMSE  of  the 
ML  estimator  and  the  CRB  versus  the  number  of  snapshots. 


is  the  Mx  1  vector  of  the  Fourier-transformed  array  outputs. 
The  estimator  (30)  represents  a  single-snapshot  variant  of 
the  frequency-domain  conventional  beamformer  [9] 

/ob(0,()  =  aH(0,4)R(4)a(0,4)  (32) 

where 

1  P_1 

(33) 

T=0 

is  the  sample  spectral  density  matrix,  the  time  index  r  de¬ 
termines  the  location  of  the  respective  short  Fourier  trans¬ 
form  sliding  window4,  and  P  is  the  total  number  of  sliding 
windows  (or,  in  the  other  words,  the  number  of  frequency- 
domain  snapshots). 

Similarly  to  the  chirp  beamformer  (26),  a  polynomial- 
phase  beamformer  can  be  defined  that  corresponds  to  a 
more  general  polynomial-phase  signal  model.  In  this  case, 
the  number  of  parameters  in  (26)  will  increase,  depending 
on  the  polynomial-phase  model  order. 

6.  SIMULATIONS 

In  all  examples,  we  assume  a  uniform  linear  array  (ULA) 
with  the  half-wavelength  spacing.  In  the  first  example,  we 
assume  a  ULA  of  M  =  10  sensors  which  receives  two  equi- 
powered  chirp  sources  with  SNR  =  0  dB  and  DOA’s  0i  = 
10°  and  02  =  15°  relative  to  the  broadside.  The  sources 
have  the  following  frequency  parameters:  wi,o  =  1.2566, 
wi,i  =  -0.0151,  w2,o  =  0.0628,  and  w2,i  =  0.0151.  In  Fig. 
1,  the  DOA  estimation  RMSE  of  the  ML  estimator  (16)-(17) 
and  the  theoretically  obtained  direction  estimation  CRB 
versus  the  number  of  snapshots  N  are  shown. 

4This  index  is  not  shown  in  (30)  because  it  is  a  particular 
case  where  the  single  window,  whose  length  is  equal  to  the  whole 
observation  length,  is  used. 
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Figure  2:  Comparison  of  the  DOA  estimation  RMSE  of  the 
ML  estimator  and  the  CRB  versus  the  SNR. 


DOA  [DEGREES] 

Figure  3:  2D  slice  of  the  chirp  beamformer.  The  true  source 
locations  are  indicated  by  stars. 


In  our  second  example,  the  same  parameters  are  used 
except  for  SNR  and  N.  We  assume  that  N  =  100  and  the 
performance  is  examined  versus  the  SNR.  The  RMSE  of  the 
ML  estimator  and  the  CRB  are  displayed  in  Fig.  2. 

In  our  third  example,  we  assume  a  ULA  of  M  =  15 
sensors  and  two  equi-powered  chirp  sources  with  the  SNR  = 
0  dB  and  9i  =  (h  =  30°.  The  following  parameters  are 
used:  N  =  15,  uq,o  =  W2,o  =  0.8000,  u)\,i  —  —0.1600, 
and  u2,i  =  0.1600.  Fig.  3  displays  the  2D  slice  of  the  3D 
chirp  beamforming  function  evaluated  at  £  =  0.8000.  From 
this  figure,  we  observe  that  the  chirp  beamformer  is  able  to 
resolve  closely  spaced  sources  (and  even  sources  having  the 
same  DOA’s  and  initial  frequencies),  based  solely  on  the 
difference  of  their  chirp  rates. 
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ABSTRACT 

The  detection  of  near-stationary  targets  in 
mainlobe  clutter  is  a  problem  that  has  recently 
generated  a  great  deal  of  interest  within  the 
Department  of  Defense  community.  Some  examples 
of  these  types  of  targets  are  surface  vehicles,  missile 
launchers  and  loitering  (micro-)  Unmanned  Aerial 
Vehicles  (UAVs).  The  root  of  the  difficulty  lies  in 
the  fact  that  conventional  radar  processing  loses  the 
ability  to  use  the  Doppler  of  the  target  to 
discriminate  it  from  the  clutter.  Indeed,  the  target 
need  not  even  be  nearly  stationary  for  this  to  be  a 
problem  -  even  a  rapidly  moving  target  can  exhibit 
low  Doppler  if  its  velocity  vector  is  nearly 
perpendicular  to  the  velocity  vector  of  the 
observation  platform.  Raytheon  Systems  Company 
(Raytheon)  has  been  investigating  a  number  of 
advanced  algorithmic  solutions  to  this  problem 
within  the  context  of  providing  a  dual-mission 
capability  to  currently  fielded  RF  missile  systems. 
This  paper  describes  a  processing  architecture  that 
combines  preprocessing,  Time-Frequency 
Transforms  and  Best  Bases  algorithms  and  discusses 
some  preliminary  results. 

1.  INTRODUCTION 

We  propose  a  novel  method  for  the  detection 
of  stationary  targets  in  monostatic  clutter  [1,  2],  This 
is  a  region  where  conventional  Space-Time  Adaptive 
Processing  (STAP)  algorithms  experience  difficulty 
since  there  is  no  longer  any  Doppler  discriminant 


available.  While  a  number  of  hardware  solutions 
have  been  proposed,  for  example,  the  addition  of  an 
adjunct  infrared  sensor  and  associated  processing 
hardware,  an  RF-based  algorithmic  solution  remains 
a  very  attractive  option.  This  option  should  also 
provide  the  basis  for  a  dual-mission  RF  missile, 
thereby  extending  the  capability  of  currently  fielded 
hardware.  Raytheon  is  investigating  a  number  of 
algorithmic  approaches  to  this  problem;  in  this 
manuscript,  we  concentrate  on  the  use  of  Time- 
Frequency  Transforms  in  combination  with  various 
pre-filtering  and  post-processing  algorithms.  We 
have  achieved  the  best  performance  by  first 
preprocessing  the  data  using  whitening  or  Wiener 
filters,  then  mapping  the  1-D  time  series  data  onto  a 
2-D  Time-Frequency  image  using  a  Wigner-Ville 
Transform  (WVT)  to  enhance  features,  and  finally 
employing  a  Best  Bases  type  of  algorithm  for  feature 
extraction.  We  note  that  our  approach  to  this 
problem  is  similar  in  spirit  to  that  proposed  by 
Haykin  in  References  3-4. 

The  proposed  target  detection  algorithm  is 
shown  schematically  in  Figure  1;  the  red  outlined 
area  indicates  the  nonstandard  processing  portion  of 
this  algorithm.  Notice  that  the  Time-Frequency 
Analysis  (TFA)-Best  Bases  processing  stream 
allows  the  natural  introduction  of  feature  fusion. 
This  is  an  important  characteristic,  since  a 
number  of  programs  at  Raytheon  have  had 
considerable  success  using  feature  fusion  for 
improving  target  classification. 


0-7803-5988-7/00/$  10.00  ©  2000  IEEE 
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Figure  1:  Time-Frequency  Detection  Scheme 

Figure  2  below  shows  the  output  of  the  Feature 
Extraction  block  in  Figure  1  for  0  dBsm  target. 
At  this  point  the  data  has  been  filtered  and 
passed  through  a  WVT.  The  Best  Bases 
algorithm  that  we  have  used  in  this  analysis  is 
the  Local  Discriminant  Bases  (LDB)  [5] 
algorithm  of  Coifman  and  Saito.  LDB  was 
originally  developed  in  1994  as  a  technique  for 
analyzing  object  classification  problems.  Since 
then,  extensions  have  been  developed  for 
regression,  optimization  and  signal  de-mixing 
applications. 
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Figure  2:  LDB  Screen  for  Target  Detection 


2.  TIME-FREQUENCY  ANALYSIS  USING 
RANGE-DOPPLER  MAP  PHASES 

Raytheon  has  also  been  investigating  the  use 
of  phase  information  from  the  Range-Doppler  maps 
for  radar  signals.  In  conventional  monopulse  seekers, 
only  the  amplitudes  of  the  complex-valued 
range/Doppler-filter  outputs  are  used  for  target 
detection  and/or  identification  -  the  random-like 
phases  are  seldom  considered  helpful.  However,  it 
has  recently  been  proposed  [6]  that  it  is  essential  to 
utilize  the  whole  complex-valued  range-Doppler 
image  because  much  of  the  information  about  the 
target  is  contained  in  the  phase.  We  have  used  simple 
correlation  and  spectrum  estimation  techniques  to 
extract  the  phase  signals,  and  used  some  simple  de¬ 
trending  techniques  to  correct  the  clutter  leakage  of 
the  Doppler  FFT. 

Further  improved  results  are  expected  when 
more  advanced  spectrum  estimation  and  FFT  leakage 
correction  techniques  are  employed.  For  example, 
the  current  Power  Spectral  Density  (PSD)  technique 
can  only  detect  targets  at  different  range-gates.  We 
do  not  know  the  target  Doppler,  and  cannot  resolve 
targets  located  at  the  same  range-gate  but  with 
different  Doppler  frequencies.  Furthermore,  we  can 
not  distinguish  different  targets  that  may  have  similar 
PSD  functions.  We  have  begun  to  investigate  Time- 
frequency  analysis  for  this  problem  and  believe  that 
it  has  promise  here  since  it  provides  an  ability  to 
measure  the  whole  frequency  components  at 
different  Doppler  frequencies. 

As  shown  in  the  PSD  plots  in  Figures  3,  4, 
and  5,  the  phase  signals  with  targets  have  much 
higher  low  frequency  components  than  the  phase 
signal  with  clutter  only.  Therefore,  we  can  detect 
targets  at  the  range-gate  of  the  signal  using  the  low 
frequency  components.  However,  we  do  not  know 
the  target  Doppler  and  cannot  discriminate  among 
different  targets. 

As  shown  in  the  two-dimensional  Wigner- 
Ville  plot  (c./,  Figure  3),  the  phase  signal  with 
clutter  only  has  a  wide  frequency  band  across  almost 
the  whole  Doppler  duration.  There  are  also  two  weak 
linear  chirps  appearing  in  the  lower  Doppler. 
However,  when  a  target  T-60  is  included  in  the  RF 
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signal,  the  energy  at  the  lower  Doppler  (where  the 
target  Doppler  located)  is  much  lower  than  the  signal 
with  clutter  only,  as  shown  in  Figure  4.  For  a 
different  target  T-120,  we  can  find  the  similar  result 
as  shown  in  Figure  5.  It  is  interesting  to  note  that  this 
target  generates  two  linear  chirps  at  the  higher 
Doppler;  therefore,  time-frequency  analysis  may  help 
discriminate  among  different  targets. 
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Figure  3.  Clutter  Only  Phase  Signal 
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Figure  4.  Phase  Signal  with  Clutter,  Receiver 
Noise  and  Target  T-60 
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Figure  5.  Phase  Signal  with  Clutter,  Receiver 
Noise  and  Target  T-120 


3.  CONTINUOUS  WAVELET 
TRANSFORMS 

Raytheon  has  also  been  investigating  the  use 
of  continuous  wavelet  transforms  (CWT)  for  target 
detection  and  feature  extraction.  The  advantage  of 
using  this  method  is  that  there  are  no  artificial 
interference  terms  generated  in  the  analysis  unlike 
that  of  the  WVT,  and  there  are  a  larger  number  of 
possible  “mother”  wavelet  functions  from  which  we 
can  obtain  more  optimal  time-frequency  analyses 
with.  The  downside  of  this  scheme  is  the  high 
computational  complexity.  We  are  currently 
investigating  means  to  improve  the  CWT  either  by 
implementing  further  improvements  to  the  algorithm 
itself,  and/or  by  performing  the  calculations  using 
fast  analog  signal  processors  [7],  Of  course,  the 
WVT  can  be  implemented  on  these  analog  devices  as 
well. 

4.  CONCLUSION 

In  this  manuscript,  we  have  presented  some 
preliminary  result  of  using  TFT,  in  combination  with 
pre-filtering  and  post-processing,  to  detect  near 
stationary  targets  in  main  lobe  clutter.  Raytheon  is 
also  investigating  a  number  of  other  algorithms, 
including  polarization  STAP,  covariance  matrix 
conditioning,  waveform  diversity,  non-deci  mated 
wavelet  transforms  and  higher  order  statistics.  We 
have  also  begun  looking  at  some  very  interesting 
work  on  optimized  kernel  TFTs  [8,  9]. 
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ABSTRACT 

Two  applications  of  the  adaptive  joint  time-frequency 
(AJTF)  algorithm  for  ISAR  image  formation  are 
presented.  First,  AJTF  is  utilized  for  ISAR  motion 
estimation  and  compensation.  Focused  images  from 
measured  radar  data  are  presented  to  illustrate  the 
effectiveness  of  the  algorithm  when  applied  to  in-flight 
aircraft  data.  Second,  the  AJTF  algorithm  is  extended  to 
detect  the  presence  of  chaotic,  three-dimensional  motions 
in  an  articulating  target.  Preliminary  test  results  on 
measured  data  show  that  the  algorithm  can  correctly  detect 
those  imaging  intervals  where  significant  three- 
dimensional  motions  exist. 

1.  INTRODUCTION 

High-resolution  inverse  synthetic  aperture  radar  (ISAR) 
imaging  is  a  promising  tool  for  non-cooperative  target 
identification  (NCTI).  The  main  challenge  in  ISAR-based 
NCTI  is  to  form  a  well-focused  image  of  an  articulating 
target  with  unknown  motion.  In  this  paper,  we  first  review 
the  application  of  joint  time-frequency  methods  for  ISAR 
image  formation.  By  using  an  adaptive  joint  time- 
frequency  (AJTF)  algorithm  to  estimate  the  phase  of  the 
prominent  scatterers,  we  show  that  the  target  motion  can  be 
estimated  and  a  focused  image  of  the  target  can  be 
constructed.  Results  of  applying  the  algorithm  to 
measured  ISAR  data  are  presented  and  discussed. 
Secondly,  we  report  on  our  recent  work  to  extend  the 
AJTF  algorithm  to  address  the  more  challenging  situation 
when  the  motion  of  the  target  is  not  limited  to  a  two- 
dimensional  plane.  In  particular,  we  discuss  our  research 
to  detect  the  presence  of  three-dimensional  motion  using 
the  AJTF  algorithm. 

2.  ISAR  MOTION  COMPENSATION 
USING  JOINT  TIME-FREQUENCY 
ALGORITHM 


followed  by  fine  motion  compensation  in  the  cross  range 
dimension.  Joint  time-frequency  techniques  have  been 
shown  to  be  a  useful  tool  to  carry  out  the  fine  motion 
compensation  [1,2].  We  assume  that  after  the  coarse  range 
alignment,  all  the  scatterers  are  located  in  their  respective 
range  cells.  The  radar  backscattered  signal  as  a  function  of 
dwell  time  t  in  a  particular  range  cell  can  be  written  as 

E(t)  =  XA  expE-;— (R(t)+Xk  co$9(t) 

t i  c  (1) 

+  yksin0(t))] 


where  N  is  the  number  of  point  scatterers  in  that  range  cell, 
and  Ak,  xh  yk  are  respectively  the  scattering  amplitude, 
down  range  position  and  cross  range  position  of  the  k,h 
point  scatterer.  R(t)  is  the  residual  uncompensated 
translation  displacement  and  d(t)  is  the  rotational 
displacement.  Due  to  translation  and  rotational  motion,  the 
Doppler  frequency  versus  dwell  time  behavior  of  the  point 
scatterers  within  this  range  cell  is  not  constant  in  the  joint 
time-frequency  plane  (see  Fig.  1).  An  effective  JTF 
technique  to  extract  the  motion  parameters  is  based  on  a 
search  and  projection  procedure  to  represent  the  phase 
behavior  of  the  signal  E(t).  This  procedure  is  based  on  the 
adaptive  spectrogram  proposed  in  [3],  and  is  similar  in 
concept  to  a  one-term  matching  pursuit  algorithm  [4],  We 
shall  term  it  the  adaptive  JTF  (AJTF)  algorithm.  To  find 
the  motion  parameters,  basis  functions  in  the  form  of 

h(t)  =  exp[-j(aIt  +  a2t2  +a3t3 )]  (2) 

are  chosen.  We  search  for  the  basis  function  over  the 
parameter  space  (ay,  a2,  as)  that  best  represents  the  time- 
frequency  behavior  of  the  signal  by  maximizing  the 
projection  of  the  signal  onto  the  basis: 


max 

al  >«2  'a3 


J 


E(t  )h  (t  )dt 


(3) 


We  first  review  the  application  of  joint  time-frequency 
methods  for  ISAR  image  formation.  To  form  a  focused 
image  from  raw  radar  data,  it  is  customary  to  first  carry  out 
a  coarse  alignment  of  the  data  in  the  range  dimension, 


After  the  time-varying  phase  for  the  strongest  point 
scatterer  is  found,  we  multiply  the  original  signal  by  the 
conjugate  of  this  phase  factor  to  compensate  for  the 
translation  motion.  This  algorithm  can  also  be  extended  to 
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Fig.  1.  Fine  motion  compensation  is  carried  out  by 
the  Doppler  frequency  versus  dwell  time  behavior  of 
the  strong  point  scatterer  in  the  signal. 
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Fig.  2.  ISAR  image  of  an  in-flight  aircraft 
obtained  after  AJTF  motion  compensation. 


multiple  range  cells  to  correct  for  higher-order  rotation 
motion.  After  applying  the  JTF  motion  compensation,  the 
standard  FFT  processing  in  the  dwell  time  domain  brings 
the  signal  into  the  cross  range  image  domain.  Fig.  2  shows 
an  exmaple  of  applying  the  AJTF  algorithm  to  measured 
ISAR  data  of  an  in-flight  aircraft.  The  shape  of  the  aircraft 
is  clearly  visible  in  the  resulting  image  after  the  AJTF 
motion  compensation. 

3.  THREE-DIMENSIONAL  MOTION 
DETECTION  USING  JOINT  TIME- 
FREQUENCY  ALGORITHM 

One  basic  assumption  of  standard  motion  compensation 
algorithms  is  that  the  target  only  undergoes  motion  in  a 
two-dimensional  plane  during  the  dwell  duration  needed  to 
form  an  image.  From  several  independent  examinations  of 
measured  ISAR  data  sets  recently,  it  was  reported  that  the 
presence  of  three-dimensional  motion  is  quite  detrimental 
to  focusing  the  image  [5-7].  We  shall  report  on  our  recent 
work  to  extend  the  AJTF  algorithm  to  address  the  more 
challenging  situation  when  the  motion  of  the  target  is  not 
limited  to  a  two-dimensional  plane.  In  particular,  we 
discuss  our  research  to  detect  the  presence  of  three- 
dimensional  motion  using  the  AJTF  algorithm. 

Allowing  for  arbitrary  three-dimensional  motion  in  space, 
we  consider  the  following  model  as  a  generalization  of  the 
model  for  two-dimensional  motion  in  (1): 

N  4nf 

E(t)=  Y,Ak  xk  +  yke  +  Zk</>)]  (4> 

k=l  c 

where  9  is  the  azimuth  angle  of  the  target  with  respect  to 
the  radar,  and  <p  is  the  elevation  angle.  In  (4),  it  is  assumed 
that  the  translation  motion  has  been  removed  and  that  the 
standard  small-angle,  small  bandwidth  approximations 
apply.  This  model  reduces  to  the  standard  two-dimensional 
motion  model  when  0and  <j)  are  linear  ly  related. 

In  general,  a  focused  image  cannot  be  obtained  from  the 
standard  two-dimensional  motion  compensation  algorithm 
when  three-dimensional  target  motion  is  present  due  to 
model  mismatch.  Therefore,  it  would  be  useful  to  detect 
the  presence  of  three-dimensional  motion  directly  from  the 
radar  data.  Our  approach  is  to  utilize  the  AJTF  algorithm 
to  extract  the  phase  behavior  of  the  radar  data  at  multiple 
range  cells.  We  first  parameterize  the  phase  of  the 
prominent  point  scatter  in  one  range  cell  using  AJTF. 
Next  we  repeat  the  same  procedure  at  another  range  cell. 
It  can  be  shown  that  when  the  target  undergoes  only  two- 
dimensional  motion  during  the  dwell  duration,  the  ratio 
between  the  parameters  ( ai ,  «2>  ai)  extracted  from  one 
range  cell  and  those  corresponding  parameters  in  another 
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Fig.  3.  (a)  Simulated  2D  target  motion,  (b)  Phase 

behavior  of  the  prominent  point  scatterer  in  range  cell  1 
extracted  using  AJ  I  F.  (c)  Phase  behavior  of  the  prominent 
point  scatterer  in  range  cell  2  extracted  using  AJTF.  (d) 
Ratios  of  the  extracted  phase  parameters  from  the  two 
range  cells.  Note  that  they  are  nearly  constant,  (e)-(h) 
Similar  to  (a)-(d),  except  that  3D  motion  is  assumed.  The 
resulting  ratios  in  (h)  are  no  longer  constant. 

range  cell  should  be  constant.  Therefore,  by  examining  the 
ratio  of  the  parameters,  we  can  distinguish  two- 
dimensional  motion  from  three-dimensional  motion.  Fig.  3 
illustrates  the  idea  using  simulated  point  scatterer  data. 
Figs.  3(a)-(d)  show  the  two-dimensional  motion  scenario 
and  Figs.  3(e)-(h)  show  the  three-dimensional  scenario.  It 
can  be  seen  from  the  results  in  Fig.  3(d)  that  the 
determined  ratios: 

c,  =a,( range  cell  1)/  a,( range  cell  2)  (5) 


(h)  c,=7.20 

c2=1.60 
C3--IAI 

are  nearly  constant  for  all  the  terms  in  case  of  two 
dimensional  motion,  as  expected.  For  three-dimensional 
motion,  the  ratios  are  not  the  same,  as  seen  in  Fig.  3(h). 

Fig.  4  shows  our  preliminary  results  of  applying  the  3D 
motion  detection  algorithm  to  real  radar  data.  Fig.  4(a) 
shows  the  degree  of  three-dimensional  motion  in  the  data 
for  20  different  image  frames,  detected  by  applying  our 
algorithm  to  the  raw  radar  data.  As  a  reference  for 
comparison,  Fig.  4(b)  shows  the  degree  of  three- 
dimensional  motion  for  the  same  20  frames  measured 
using  the  motion  data  derived  from  inertial  navigation 
instruments  carried  onboard  the  aircraft  during  data 
collection.  It  can  be  seen  that  our  algorithm  correctly 
detects  where  significant  three-dimensional  motions  exist. 
We  are  currently  fine  tuning  the  algorithm  to  achieve  faster 
and  more  robust  detection.  We  believe  this  detection 
algorithm  could  be  quite  useful  for  determining  the  “good” 
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Fig.  4.  Blind  detection  of  three-dimensional  motion 
from  real  radar  data,  (a)  Degree  of  three-dimensional 
motion  over  20  image  frames  detected  using  the 
proposed  algorithm,  (b)  Degree  of  three-dimensional 
motion  measured  from  on-board  instrument  data. 
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imaging  intervals  from  which  focused  images  can  be  more 
readily  generated.  For  targets  that  exhibit  very  chaotic 
motions,  such  as  ships  on  the  ocean,  finding  such  intervals 
of  opportunity  may  be  very  critical  for  target  recognition. 

4.  SUMMARY 

In  this  paper,  we  presented  two  applications  of  the 
adaptive  joint  time-frequency  algorithm  for  ISAR  image 
formation.  In  the  first  application,  we  carry  out  fine 
motion  compensation  to  form  focused  ISAR  images  of 
articulating  targets.  The  AJTF  algorithm  is  used  to 
estimate  the  phase  of  the  prominent  point  scatterer  within  a 
range  cell.  The  higher-order  phase  error  due  to 
uncompensated  translation  and  rotational  errors  are  then 
removed  prior  to  the  image  formation.  Results  show  that 
well-focused  images  can  be  obtained  from  measured  data 
of  an  in-flight  aircraft.  In  the  second  application,  we  try  to 
detect  the  presence  of  three-dimensional  target  motion,  for 
which  a  well-defined  imaging  plane  does  not  exist.  A 
three-dimensional  motion  model  is  utilized  and  the 
linearity  of  the  phase  functions  of  the  prominent  point 
scatterers  between  different  range  cells  is  used  to 
distinguish  two-dimensional  from  three-dimensional 
motion.  The  AJTF  engine  is  again  used  to  extract  the 
phase  function  of  the  prominent  scatterer  within  each  range 
cell.  Preliminary  test  results  using  real  radar  data  indicate 


that  the  algorithm  can  be  used  to  detect  those  imaging 
intervals  where  conventional  two-dimensional  motion 
assumption  would  fail.  We  are  working  to  devise 
algorithms  for  forming  focused  images  even  in  the 
presence  of  these  three-dimensional  motions. 
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ABSTRACT 

The  image  formation  process  associated  with  coherent 
imaging  sensors  is  particularly  sensitive  to  and  is  often 
corrupted  by  non-stationary  processes.  In  the  case  of 
synthetic  aperture  radar  (SAR),  non-stationary  processes 
result  from  motion  within  the  scene,  variable  radar  cross 
section,  multi-path,  topographic  variations,  sensor 
anomalies,  and  deficiencies  in  the  image  formation 
processing  chain.  This  paper  addresses  SAR  image 
formation  processing,  the  complex  response  function  for 
a  point  source,  and  SAR  JTF  image  formation 
implementations.  Each  of  these  topics  is  described 
within  the  context  of  applying  JTF  processing  to  all 
aspects  of  SAR  image  formation  and  analysis. 

1.  INTRODUCTION 

The  fundamental  attribute  of  a  synthetic  aperture  radar 
(SAR)  is  its  ability  to  directly  sample  the  complex 
Fourier  domain  of  the  spatial  reflectivity  map  from  an 
illuminated  ground  patch.  This  reflectivity  map,  or 
radar  image,  is  the  standard  product  from  a  SAR  sensor. 
Apart  from  issues  of  sensor  motion  compensation  and 
other  corrections  accounted  for  by  the  image  formation 
processor  the  data  collected  by  a  SAR  sensor  is  related 
to  the  desired  radar  image  through  a  2-dimensional 
Fourier  transform.  Key  to  the  understanding  and 
interpretation  of  SAR  imagery  is  the  realization  that  the 
received  radar  pulses  are  sampling  the  Fourier  domain 
at  different  times  and,  when  taken  as  a  whole,  fill  a 
small  annular  region  of  the  Fourier  plane.  The  finite 
time  scale  of  a  SAR  coherent  data  period  (fraction  of  a 
minute)  and  the  limited  coverage  in  the  Fourier  plane 
(angular  span  of  a  few  degrees  and  radial  extent  in 
proportion  to  the  pulse  fractional  bandwidth)  combine 
to  form  the  root  cause  for  the  presence  of  and  sensitivity 
to  non-stationary  processes  in  SAR  data.  Although  non- 
stationary  processes  can  degrade  radar  image  quality 
and  introduce  peculiar  signature  artifacts,  the  sensitivity 
of  SAR  sensors  to  non-stationary  processes  provides  an 
outstanding  exploitation  opportunity  no  incoherent 
imaging  system  can  attest  to.  High-resolution  SAR 
sensors  are  the  best  data  sources  for  non-stationary 
signal  exploitation  since  they  span  the  longest  coherent 
data  period  and  have  the  largest  range  bandwidth. 


Moreover,  effective  analysis  of  non-stationary  processes 
can  lead  to  their  removal  from  the  standard  product 
yielding  higher  quality  imagery. 

The  analysis  of  non-stationary  processes  in  SAR  data  is 
necessarily  a  clutter,  not  a  noise,  dominated  problem. 
Imaged  scenes  generally  include  some  combination  of 
urban  infrastructure,  vegetative  ground  cover,  terrain 
features,  water,  and  moving  targets.  Although  these 
scene  content  categories  contribute  to  stationary  and 
non-stationary  signal  processes  in  SAR  data,  stationary 
processes  tend  to  dominate  most  scenes.  If  this  were 
not  the  case,  the  value  of  SAR  imagery  would  be  greatly 
diminished.  Stationary  processes  are  considered  clutter 
within  the  context  of  non-stationary  signal  analysis. 

The  inevitable  presence  of  non-stationary  processes  in 
SAR  data  spanning  any  real  scene  compels  some  form 
of  JTF  analysis.  However  the  bi-Iinear  character  of 
traditional  JTF  analysis  typically  requires  some  form  of 
filtering  to  mitigate  the  effects  of  the  confusing  cross 
terms,  an  overwhelming  source  of  interference  for  a 
filled  aperture  SAR  sensor.  Here,  ‘filled  aperture’  refers 
to  significant  reflectivity  over  the  entire  imaging  patch 
as  opposed  to  the  unfilled  apertures  of  Inverse  SAR 
(ISAR)  imaging  of  ships  and  aircraft  [1],  The 
preponderance  of  these  interference  terms  limits  broad 
utility  of  current  JTF  approaches  within  the  context  of 
SAR  signal  processing  for  single-phase  center  and 
single-frequency  systems.  Considerations  of  the 
underlying  assumptions  of  SAR  image  formation 
processing  together  with  the  rich  content  of  any  real 
scene  suggest  a  future  developmental  path  comprising 
data  driven  JTF  techniques  focussing  on  the  separability 
of  stationary  and  non-stationary  processes. 

The  exploitation  of  non-stationary  processes  in  SAR 
data  can  be  facilitated  through  joint  time-frequency 
(JTF)  signal  processing.  The  most  widely  used  JTF 
technique  is  the  short-time  Fourier  transform  (STFT). 
STFT  processing  in  the  parlance  of  SAR  analysis  is 
often  referred  to  as  sub-aperture  processing.  The  SAR 
aperture  that  is  synthesized  over  time  by  the  relative 
motion  between  the  sensor  platform  and  the  aim  point  is 
subdivided  into  smaller  segments  resulting  in  improved 
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Figure  1.  The  SAR  image  formation  processing  chain  may  be  generalized  into  the  computational  elements  indicated  in  this  flow 
diagram.  The  dashed  arrows  represent  real  signals  and  magnitude  imagery,  whereas  the  solid  arrows  represent  complex  signals 
and  imagery.  The  callouts,  or  data  stream  taps,  indicate  locations  in  the  processing  chain  where  we  consider  joint  time- 
frequency  signal  processing  may  be  of  benefit.  See  the  text  for  a  description  of  the  taps. 


presentation  of  non-stationary  signatures,  such  as 
moving  targets,  but  at  the  expense  of  degraded 
azimuthal  resolution.  A  series  of  STFT  sub-apertures 
created  in  this  way  form  a  three-dimensional  data 
volume. 

One  advanced  JTF  technique  relies  on  the  Wigner-Ville 
distribution  (WVD)  characterizing  both  stationary  and 
non-stationary  processes  without  any  degradation  in 
resolution.  The  stationary  and  non-stationary 
components  in  the  WVD  time-frequency  representation 
are  self-terms  of  the  underlying  bi-linear  distribution. 
The  cross  terms  of  the  bi-linear  distribution,  however, 
introduce  artifacts  so  severe  as  to  render  the  utility  of 
the  JTF- WVD  data  volume  unsuitable  for  many  SAR 
related  exploitation  purposes.  Each  scattering  center 
represented  in  the  time-frequency  plane  mixes  with 
every  other  scattering  center  regardless  of  whether  the 
scattering  center  is  stationary  or  not.  If  we  take 
signatures  represented  in  the  time-frequency  plane  in 
pairs,  the  result  of  WVD  is  to  introduce  artificial 
signatures  at  a  location  half  way  between  each  pair  with 
an  amplitude  greater  than  either  signature  taken 
individually.  The  superset  of  bi-linear  JTF  distributions 
is  Cohen’s  class  of  distributions  [2],  Various  filtering 
mechanisms  [3]  have  been  developed  in  an  attempt  to 
reduce  the  cross  term  effects  of  WVD  and  the  many 
other  instances  of  Cohen’s  class  of  distributions. 


With  the  exception  of  very  low  clutter  environments, 
JTF  analysis  of  SAR  data  should  begin  with  the 
segmentation  of  stationary  clutter  from  non-stationary 
signals.  In  support  of  eventual  automated  signal 
analysis,  considerable  importance  should  be  placed  on 
having  the  signal  data  drive  the  available  degrees  of 
freedom  afforded  by  JTF  algorithms.  The  principle 
degree  of  freedom,  or  adjustable  parameter,  is  typically 
related  to  an  area  of  regard  such  as  the  window  width 
for  the  STFT. 

2.  SAR  IMAGE  FORMATION 

The  essential  stages  of  SAR  image  formation  processing 
are  presented  to  illustrate  the  data  stream  taps  where  we 
consider  JTF  processing  can  benefit  the  analysis  of  SAR 
data.  See  Figure  1.  Most  of  the  discussion  of  SAR 
image  formation  in  this  section  centers  on  spotlight¬ 
mode  processing  [4]  [5].  Spotlight-mode  SAR  attains 
the  best  resolution  of  the  possible  collection  modes  of  a 
SAR.  During  the  coherent  data  period,  the  antenna  is 
steered  to  remain  pointed  at  a  fixed  aim  point.  The 
cross-range  resolution  is  governed  principally  by  the 
angular  extent  of  the  dwell  on  the  aim  point. 
Conversely,  the  antenna  of  a  stripmap-mode  SAR 
images  broadside  to  the  platform  velocity  vector 
resulting  in  a  cross-range  resolution  equal  to  Vi  the 
effective  antenna  diameter.  In  either  case,  the 
bandwidth  of  the  pulse  waveform  is  often  chosen  to 
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provide  a  range  resolution  comparable  to  that  in  the 
cross-range  direction. 

2.1  Processing  Chain 

The  objectives  for  JTF  SAR  processing  fall  into  two 
categories.  One  is  value-added  exploitation  of  the 
targets  in  the  scene,  and  the  other  is  enhancing  or 
redefining  the  mechanics  of  the  image  formation 
process.  Of  the  data  stream  taps  indicated  in  Figure  1, 
tap  {4}  is  the  most  readily  available  from  commercial 
systems.  Gaining  access  to  the  other  taps  normally 
requires  direct  access  to  an  image  formation  processor, 
usually  tailored  to  a  specific  system,  so  that  the 
necessary  modifications  can  be  made.  The  JTF 
approaches  we  have  under  investigation  are  arranged  by 
tap  number. 

1.  This  is  the  first  useful  tap  in  the  SAR 
processing  chain  where  the  in-phase  and 
quadrature-phase  signals  are  formed  into  a 
complex  data  stream.  JTF  exploitation  can 
potentially  begin  here  on  a  pulse  by  pulse 
basis  for  problems  having  pulse  width 
timescales.  JTF  signal  processing  may 
also  benefit  the  phase  correction  stages 
leading  up  to  polar  formatting.  The  pulse 
data  are  represented  here  in  a  polar 
coordinate  system.  The  polar  formatting 
step  resamples  the  pulse  data  from  polar  to 
Cartesian  coordinate  systems  in 
preparation  for  2-D  Fast-Fourier 
transforms  (FFT). 

2.  A  2-D  complex  Fourier  domain  map  is 
rendered  in  the  Cartesian  coordinate 
system  without  any  refocus  applied.  This 
tap  is  one  starting  point  for  higher 
dimensional  JTF  processing  and 
exploitation.  Performing  JTF  processing 
on  each  range  line  and  then  on  each  cross¬ 
range  line  will  result  with  a  4-D  data 
volume.  Alternatively,  applying  a  FFT 
along  either  dimension  and  then  applying 
JTF  processing  to  the  other  dimension 
results  in  a  more  manageable  3-D  data 
volume.  We  refer  to  JTF  processing  along 
the  cross-range  direction  as  slow-time 
image  formation  processing  (ST-IFP) 
processing,  and  along  the  range  direction 
as  fast-time  image  formation  processing 
(FT-IFP)  processing.  JTF  signal 
processing  may  also  benefit  the  refocusing 
stages  prior  to  the  cross-range  FFT. 


3.  Applying  an  inverse  1-D  range  FFT  to  this 
tap  returns  us  to  tap  {2}  with  the  added 
benefit  of  improved  focus.  Moreover, 
since  range  compression  has  already 
occurred,  ST-JTF  processing  can  proceed 
directly  from  this  tap.  For  these  reasons 
we  consider  tap  {3}  to  be  of  greater 
practical  use  than  tap  {2}  for  most 
problems. 

4.  This  tap  provides  the  slant  plane  complex 
image.  The  slant  plane  is  the  plane  formed 
by  the  platform  velocity  vector  and  the 
range  line  to  the  aim  point.  From  this 
point,  JTF  processing  can  proceed  after  an 
inverse  1-D  range  and/or  1-D  cross-range 
FFT  is  performed.  For  general  JTF 
exploitation,  this  tap  is  the  most 
convenient.  Earlier  taps  are  required  if 
JTF  enhancements  to  the  image  formation 
process  are  to  be  explored. 

The  annulus  sampled  by  a  SAR  in  the  complex  Fourier 
domain  is  defined  by  the  angle  subtended  by  the  dwell 
of  the  sensor  on  the  aim  point  and  by  the  bandwidth  of 
the  pulse  waveform.  Even  if  there  were  no  motion  in 
the  scene  non-stationary  processes  can  still  be  expected. 
Coherent  response  from  structures  comprising  linear, 
planar,  dihedral,  and  trihedral  elements  result  in 
correlated  phase  in  the  complex  image  domain.  As  a 
result,  the  computed  reflectivity  map  will  vary  between 
selected  annuli  in  the  Fourier  domain.  Equivalently,  the 
reflectivity  of  man-made  structures  is  aspect  dependent. 
Conversely,  a  scene  dominated  by  random  scattering 
processes  will  result  with  a  reflectivity  map  that  is 
independent  of  the  subset  selected,  apart  from 
differences  in  the  speckle  content. 

Although  many  of  the  stages  depicted  in  the  processing 
flow  diagram  are  designed  to  correct  for  platform 
motion  and  sampling  artifacts,  the  range  de-skew  stage 
is  directly  related  to  the  formulation  of  the  SAR 
response  function.  Range  skew  is  a  phase  term  that 
represents  a  departure  of  the  SAR  response  function 
from  the  2-D  Fourier  transform  of  the  desired  radar 
reflectivity  map.  The  de-skew  correction  is  therefore 
accomplished  prior  to  the  1-D  FFT,  or  compression, 
stages.  The  occurrence  of  range  dependent  skew  in  the 
SAR  response  function  is  highlighted  in  the  next 
section. 

2.2  Normalized  Response  Function 

The  normalized  SAR  response  function  for  a  point 
source  with  complex  reflectivity  ge  may  be  expressed  as 
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fe{i)  =  ^Ae{p)  ge  e 


2m\ 


W~2  1+ 


(1) 


where  Ag(p)  encompasses  complex  antenna  gain  and 
any  propagative  effects,  p  is  the  position  vector  of  the 
point  source  relative  to  the  ground  reference  point,  Q  is 
the  ratio  of  the  center  frequency  to  the  chirp  waveform 
bandwidth,  x  is  the  pulse  time  normalized  by  the  pulse 
duration,  p  is  the  range  offset  to  the  point  source 
relative  to  the  ground  reference  point  and  is  normalized 
by  the  center  frequency  wavelength,  0  is  the  polar  angle 
described  by  the  motion  of  the  platform  as  seen  by  the 
aim  point,  and  8  is  the  inverse  of  the  mean  number  of 
cycles  transmitted  during  the  duration  of  a  radar  pulse. 
Although  the  frequency  and  bandwidth  of  a  SAR  define 
the  character  and  resolution  of  a  radar  image,  the  system 
Q  is  key  to  the  dynamic  range  available  to  the  sensor. 
The  smaller  the  Q,  or  the  larger  the  fractional 
bandwidth,  the  more  robust  is  the  sampling  of  the 
complex  phase  domain.  The  derivation  of  (1)  extends 
from  pioneering  tomographic  approaches  to  spotlight¬ 
mode  SAR  processing  [6], 

The  first  phase  term  in  (1)  causes  a  range  dependent 
distortion,  or  image  skew.  This  term  is  removed  from 
the  SAR  signal  history  by  the  range  de-skew  stage 
illustrated  in  Figure  1 .  Although  the  magnitude  of  this 
phase  term  varies  as  the  square  of  the  range  offset  from 
the  aim  point,  the  coefficient  e  is  sufficiently  small  to 
allow  this  term  to  be  neglected  in  many  cases. 

After  the  image  de-skew  corrections  are  applied  to  the 
signal  history,  the  SAR  response  function  is  seen  to 
reduce  to  the  Fourier  transform  of  a  point  source 
projected  along  a  polar  angle  0.  Pulse  data  collected 
over  a  sufficiently  large  polar  annulus  can  then  be 
resampled  to  a  Cartesian  grid  and  inverse  Fourier 
transformed  to  produce  the  radar  image.  This  approach 
to  SAR  image  formation  is  accurate  if  there  are  only 
stationary  processes  present  in  the  signal  history.  An 
examination  of  (1)  shows  many  sources  where  non¬ 
stationary  processes  can  be  introduced. 

•  Ae(p)  -  (a)  The  antenna  beam  pattern 
and  the  image  patch  size  are  chosen  to 
minimize  image  quality  degradation. 
Beam  de-shading  performed  after  the 
detection  stage  in  Figure  1  will  correct  for 
beam  related  intensity  rolloffs.  Apart  from 
sensor  hardware  instabilities  that  affect  the 
complex  antenna  gain,  antenna  properties 


are  not  considered  to  be  a  significant 
contributor  to  non-stationary  processes,  (b) 
Although  radar  is  generally  considered  a 
day,  night,  and  all  weather  sensor, 
electrical  storms  can  introduce  propagative 
anomalies  that  will  affect  image  quality. 
Spacebome  SARs  may  further  be  affected 
by  inhomogeneities  and  fluctuations  in  the 
electron  density  of  the  ionosphere. 

•  ge  -  The  complex  reflectivity  of  a  point 

source  can  vary  over  the  polar  angle 
spanned  by  the  coherent  data  period.  This 
is  especially  true  for  linear,  planar,  and 
dihedral  structures  whose  principle 
attribute  for  non-stationary  processes  is 
that  they  have  very  narrow  beam  patterns. 

•  Pe  -  Time  dependent  variations  in  the 
range  offset  due  to  motion  in  the  scene  is 
the  most  popular  issue  addressed  by 
researchers  exploring  JTF  applications  for 
imaging  radars.  Mover  defocus  resulting 
from  range  acceleration,  cross-range 
velocity,  and  cross-range  acceleration  can 
be  enhanced  using  JTF  techniques,  not 
only  for  just  one  mover,  but 
simultaneously  for  all  movers  in  the  scene. 
Whereas,  JTF  techniques  may  be  effective 
for  the  sparse  scenes  of  inverse  synthetic 
aperture  radar  (ISAR),  e.g.,  the  imaging  of 
ships  or  planes,  the  stationary  clutter 
dominated  scenes  of  SAR  introduce  an 
overwhelming  source  of  cross-terms  in  bi¬ 
linear  JTF  techniques  that  make  it  difficult 
to  effectively  analyze  embedded  non- 
stationary  processes.  Techniques  for 
filtering  out  stationary  clutter  are  needed 
to  exploit  non-stationary  signals  beyond 
the  fidelity  available  from  traditional 
STFT  approaches. 

3.  CONCLUSIONS 

The  apparent  utility  of  JTF  techniques  for  SAR  data 
analysis  is  significantly  affected  by  cross-terms 
associated  with  the  bi-linear  distributions  commonly 
employed  in  the  field  of  JTF  signal  processing.  SAR 
data  are  typically  dominated  by  the  clutter  of  stationary 
processes,  e.g.,  urban  infrastructure,  vegetation,  and 
natural  terrain.  Mixed  within  that  clutter  are  non- 
stationary  signals.  Cross  terms  in  the  time-frequency 
domain  therefore  arise  from  clutter-to-clutter  mixing 
and  clutter  to  non-stationary  signals  mixing.  Despite 
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the  many  techniques  developed  to  mitigate  the  effects  of 
the  cross-terms,  the  dominance  of  stationary  clutter  in 
most  SAR  data  is  overwhelming.  Research  into  the 
separability  of  non-stationary  signals  from  stationary 
clutter  coupled  with  signal-based,  or  adaptive,  JTF 
techniques  appears  to  be  most  promising  approach  for 
extending  JTF  signal  processing  of  SAR  exploitation 
beyond  STFT  techniques. 
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ABSTRACT 

We  give  a  simple  formula  for  the  calculation  of  the  mo¬ 
ments  of  a  propagating  pulse  in  a  dispersive  medium. 
Both  the  spatial  and  time  moments  are  considered.  Ex¬ 
plicit  formulas  are  derived  for  the  spatial  spreading  of  a 
propagating  pulse  and  also  for  the  duration  of  the  pulse 
at  a  fixed  position  in  space.  In  addition,  we  give  formu¬ 
las  for  the  calculation  of  the  instantaneous  frequency 
of  a  pulse  at  a  given  position.  A  number  of  simple 
examples  are  used  to  illustrate  the  formulas  derived. 

1.  INTRODUCTION 

Linear  partial  differential  equations  whose  solutions  give 
wave  like  behavior  come  in  many  varieties,  but  fortu¬ 
nately,  the  solution  to  all  of  them  can  be  written  in 
a  simple  form  [4,5].  We  call  the  solution  to  such  an 
equation  u(x,  t)  where  x  and  t  are  the  spatial  and  time 
variable  respectively.  A  general  method  of  solution  is 
to  substitute 

eikx-iut  (!) 

into  the  wave  equation  with  the  result  that  such  a  par¬ 
ticular  solution  can  only  exist  if  there  is  a  relationship 
between  k  and  u.  The  relationship  will  be  of  the  form 

D(uj,  k)  =  0  (2) 

This  is  called  dispersion  relation.  One  can  now  solve 
for  k  in  terms  of  ui  or  the  other  way  around.  These 
relations  are  written  here  as 

k  =  K(u)  ;  u  =  W(k)  (3) 

Generally  there  will  be  more  then  one  solution  and  each 
solution  is  called  a  mode.  Furthermore  depending  on 
whether  we  have  complex  or  real  solutions  we  will  have 
damping  or  not.  Here,  we  consider  the  case  where  we 
have  no  damping,  that  is  both  K (w)  and  W ( k )  are  real. 

The  general  solution  for  u(x,t )  is  then  expressed 
in  terms  of  Fourier  integrals  taking  into  account  the 

Work  supported  by  the  Office  of  Naval  Research,  the  NASA 
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dispersion  relation.  This  is  described  in  Sections  2  and 
3.  However,  there  are  two  distinct  physical  situations 
depending  on  the  initial  conditions.  The  two  types  of 
initial  conditions  are 


Given:  u(x,0) 

( Case  One) 

(4) 

Given:  u(0,t) 

(Case  Two) 

(5) 

The  first  case  is  when  we  have  the  spatial  wave  at  a 
given  time  and  the  second  when  we  have  the  wave  at  a 
given  position  for  all  time.  An  example  of  the  first  is  if 
we  pluck  a  string  and  let  go  at  time  zero.  An  example 
of  the  second  is  if  we  are  at  a  fixed  position  and  create  a 
pulse,  for  example,  a  radar,  sonar,  or  fiber  optic  pulse. 

Group  Velocity  and  Its  Extension 

A  central  idea  in  the  study  of  pulse  propagation  is 
the  group  velocity,  vg(k),  which  is  given  by 

vg(k)=w'(k)  (6) 

There  are  many  plausible  arguments  that  have  been 
given  in  the  literature  for  calling  this  quantity  the  group 
velocity.  In  Sec.  2  we  will  give  a  new  relation  for  a  prop¬ 
agating  pulse  that  we  think  gives  a  very  clear  picture 
why  vg(k)  should  be  called  a  group  velocity  and  how 
it  is  related  to  the  propagation  of  the  center  of  mass  of 
the  pulse. 

In  Sec.  3  we  will  study  the  time  properties  of  a 
pulse  at  a  fixed  position.  We  will  see  that  the  natural 
quantity  that  appears  is 

*,(«)  =  K\u>)  (7) 

We  note  that  it  has  the  units  of  inverse  velocity.  We  will 
see  that  it  is  related  to  the  amount  of  time  delay  per 
unit  distance.  We  shall  call  it  the  group  time  delay.1 

Instantaneous  Frequency 

A  pulse  can  always  be  written  in  terms  of  its  am¬ 
plitude  and  phase 

_ u(x,t)  =  Kx,f)|e<y°(X|t)  (8) 

lrrhis  quantity  should  not  be  confused  with  “group  delay”, 
which  is  the  derivative  of  the  spectral  phase  [3]. 
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The  instantaneous  frequency  at  a  fixed  position  is  given 
by  the  partial  derivative  of  the  phase  with  respect  to 
time  2 

Q 

Ui{x,t)  =  ±—ip{x,t)  (9) 

2.  CASE  ONE 

General  Solution.  We  now  consider  the  situation  where 
the  initial  condition  is  given  by  Eq.  (4).  The  general 
solution  is  [4,5] 

u(x,t)  =  f  S(k)  eikx-iW^ldk  (10) 
v27r  J 

where  S(k)  is  the  initial  spatial  spectrum 

S{k)  =  ~=  J  u(x,  0)  e~lkxdx  (11) 

We  define  the  time  dependent  spectrum  by 

S(k,t)  =  S(k,0)e~iWWt  (12) 

where 


the  exact  first  two  moments  and  standard  deviation  are 
worked  out  to  be  [1,2] 

(x)t  =  {x)o  +  Vt  (20) 

(x2)t  =  (x2)0  +  t2(Vg)o  +  t(vgX  +  Au9))0  (21) 
°l\t  =  crl\o  +  2tCovXVa+t2a2Vg  (22) 

where 

V  =  J  vg(k)\S{k,0)\2dk  (23) 

<  =  J(vg(k)-V)2\S(k,0)\2dk  (24) 

CoViVj  =  2  (  V9^  ‘^■vg  )o  —  (  Vg  )o{  x  )o  (25) 

An  alternative  way  to  calculate  the  covariance  is  to 
first  write  S(k,  0)  in  terms  of  its  amplitude  and  phase 

S{k,0)  =  \S{k,0)\ei'‘>^  (26) 

It  can  be  shown  that  [3] 

\{vgX  +  Xvg) o  =  -  f  vg{k)ip'{k,0)\S(k,0)\2dk 

J  (27) 


S(k,0)  =  S(k)  (13) 

Therefore 

u(x,t)  =  -±=Js(k,t)eikxdk  (14) 

S(k,t)  =  — ~  f  u(x,  t)  e~tkx  dx  (15) 

v27 r  J 


and  hence  u(x,t)  and  S(k,  t)  form  Fourier  transform 
pairs  for  all  time.  Since  u(x,t)  and  S(k,t)  form  Fourier 
transform  pairs  we  can  use  the  operator  method  to  cal¬ 
culate  moments  [1,2,3] 


(xn)t  =  J xn  \u(x,t)\2dx  (16) 

=  J  S*(k,t)Xn'S(k,t)dk  (17) 

where  X  is  the  position  operator  in  the  k  representation 


X  =  i 


d_ 

dk 


(18) 


Moments.  Defining  the  group  velocity,  vg(k),  by, 
_ vj(k)  =  W'(k)  (19) 

2Whether  one  takes  ±  in  Eq.  (9)  depends  on  the  form  taken 
for  Eq.  (1).  In  particular,  ±  should  be  chosen  to  be  the  same  as 
the  sign  in  front  of  ut  in  Eq.  (1).  For  the  choice  taken  here  the 
negative  sign  should  be  used  in  Eq.  (9). 


Asymptotic  Solution.  The  standard  method  to  study 
Eq.  (14)  is  the  asymptotic  solution  which  is  obtained 
by  the  method  of  stationary  phase  [5].  The  basic  idea 
is  to  find  the  value  of  k  where  the  contribution  of  the 
intgrand  is  largest.  The  value  of  A;  is  obtained  from 
solving  the  equation  [5] 

W'{k)  =  x/t  (28) 


for  k.  Then, 


ua(x,  t)  ~  5(Ar)  j 


27T 


tW"(k) 
The  amplitude  and  phase  are 

K(x,t)|  =  |S(fc)|i 


e»fcx-tW'(A:)<-j7rsgn  W" / 4 


27T 


tW"{k) 


(29) 


(30) 


V>a{x,  t)  =  i/;(k)  +  kx  —  W ( k)t  —  7rsgn  W" /4 

(31) 


Instantaneous  Frequency.  Differentiating  the  phase, 
'Pa  (x,  t),  as  given  by  Eq.  (31)  we  have 


LOi(x,t)  =  - 


d ^  ,  .  .dWjk) 
dk  dk 


dk 

dt 


+  W(k) 


(32) 
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But  by  Eq.  (28) 


which  shows  that  u(x,  t )  and  F(u>,  x )  form  Fourier  trans¬ 
form  pairs  for  any  x. 


X  -  tw’(k)  =  0 

(33) 

Moments.  As  before,  we  can  write  the  time  moments 

and  therefore 

as 

,  ,  dip  dk 

u‘l-x't)  =  -dkm  +  Wik) 

(34) 

(fn)i  =  jt*Hx,t)?dt 

(46) 

Also,  from  Eq.  (28)  we  have 

=  J  F*(cj,x){-T)nF{uj,x)dLj 

(47) 

W"{k)ft  =  -x/t2 

(35) 

where  T  is  the  time  operator  in  the  frequency  domain3, 

giving 

T  =  idw 

(48) 

8k  x  W'(k) 

W2(k) 

Defining  the  group  time  delay,  zg(u>),  by 

St  ~  t2W"{k)  tW"{k) 

xW"{k) 

(36) 

(49) 

Any  one  of  these  can  be  substituted  into  Eq.  (34)  to 
obtain 


Ui(x,t)  = 


x  di} '> 

t2W"(k)  dk 
W'(k)  dip 
tW"(k)  dk 
W,2{k)  dtp 
xW"{k)  dk 


<<>* 

’  +  W{k) 

(37) 

(t2>x 

+  W(k ) 

(38) 

°t\x 

where 

+  W(/c) 

(39) 

3.  CASE  TWO 

General  Solution.  The  general  solution  is 

u(x,t)  =  4=  [  F(w)  eiK(“)x~iuJtduj 
v2n  J 


(40) 


the  exact  first  two  moments  and  standard  deviation  are 

t)0  +  Zx  (50) 

t2)o  +  x2{z]) o  +  x(zgT  +  Tzg)) o  (51) 
t\o  +  2x  Covtzg  +  x2a2a  (52) 

(53) 


Z  =  J  zg(w)  \F(lj,0)\2  dw 

<  =  J{zg(u)-Z)2\F(wt0)\2du,  (54) 
Co \tzg  =  \{zgT  +  Tzg)0  -  {zg)o(t)o  (55) 
Also,  if  we  write 

F(w,0)  =  |F(u;,0)|e<^u;-0>  (56) 

then 


where  F(k)  is  the  initial  time  spectrum  at  x  =  0, 

F(u)  =  -1=  /  ti(0,t)  eiutdt  (41) 

V27T  J 

We  point  out  that  u(0,  t)  is  what  is  usually  called  a 
“signal” .  We  define  the  space  dependent  spectrum  by 

F{u>,x)  =  F(u,  0)eiK^)x  (42) 

where 

F(uj,0)  =  F{uu)  (43) 

Hence, 

u(x,t)  =  F{u,,x)e-**du  (44) 

F{l>,x)  =  4=  [  u(x,t)eiutdt  (45) 

V  27r  J 


\{zgT  +  Tzg) o  =  -  /  zs(a;)V',(uJ,0)|F(a;,0)|2dw 
J  (57) 

We  point  out  that  at\x  is  what  is  commonly  called 
the  duration  of  a  signal.  In  this  case  it  is  a  duration 
of  the  signal  at  position  x.  We  see  that  for  x  — >  oo 
the  duration  must  go  to  infinity  no  matter  what  the 
duration  is  at  the  point  where  it  is  generated. 

Asymptotic  solution.  One  obtains  u>  from 

K'{u)  =  t/x  (58) 


and  the  asymptotic  approximation  is  then 

3The  reason  for  the  negative  sign  in  (-T)n  is  because  of  the 
way  the  Fourier  transform  was  defined  in  Eq.  (41). 


487 


Its  spectrum  is 


k»(M)l  =  l-FMI 


2tt 


xK"(lj) 


(60) 


i pa(x,  t)  =  r)(u>)  +  K( w)x  -  wf  -  7rsgn  K" /A 

(61) 


Instantaneous  Frequency.  Differentiating  the  phase  we 
have 


U>i(x,t)  =  -  — ipa(x,t )  = 


du> 

dt 


dr,  dK(u)  , 
dw  +  X 


+  w 
J  (62) 


ad  using  Eq.  (58)  we  have  that 


Ui{x,t) 


doj  dr) 
dt  dhj 


Also, 

=  l/x 

giving 

0w  1  K'{k) 

8t  ~  xK"{u>)  ~  tK"(k) 

Hence, 


Ui(x,t) 


1  dr) 

xK"(u))  <Lj  W 

a*i+w 

tK"{k)  du) 


(63) 

(64) 

(65) 

(66) 

(67) 


F(w)  =  V2n  <5(w  —  wo) 

(71) 

Putting  this  into  Eq.  (40)  we  obtain 

gi-y^ox-jujot 

(72) 

and  we  see  that  the  phase  is  given  by 

tp(x,  t)  =  7WqX  +  w0t 

(73) 

which  gives 

Wj  =  w0 

(74) 

for  the  instantaneous  frequency.  It  is 

independent  of 

the  dispersion  or  position. 

Example  2.  Suppose  we  take  an  impulse  at  x  =  0 


u(  0,  t)  =  6(t  —  to) 


(75) 


which  gives 


Using  Eq.  (40)  we  obtain 

2w  yx  [  4qx 


The  instantaneous  frequency  is 

,  ,,  *  “  *o 

<*(*,t)  =  — 


(76) 


(77) 


(78) 


which  is  chirp. 


Real  Spectrum.  If  the  initial  spectrum  is  real  then  rj  =  0 
and  we  have  that 


Example  3.  Consider  the  signal 

u(0,  f)  =  (cc/tt)1/4  e~at2/2-jU)o t 


(79) 


LJi(x,t)  =  w  ;  K'(u)  =  t/x  (68)  whose  spectrum  is 


4.  EXACTLY  SOLVABLE  EXAMPLES 

In  all  the  examples  we  consider  the  case  where  the  dis¬ 
persion  relation  is  given  by 

K(  w)  =  7W2  (69) 

Example  1.  We  generate  a  pulse  at  x  =  0  which  is  a 
pure  sinusoid 

u(0,t)  =  e~juJot  (70) 


x  (q/tt)1/4 

F(w)  = - j= —  exp 

V° 


(w  -  w0)2 


2a 


Working  out  the  solution  we  obtain 
(a/n)1/4  I  1 


u(x,  t)  = 


V2 a 


-  ^7X 


l  ,  (11+1) 

2q  4(^  ~  *7*) 


(80) 


(81) 


(82) 
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(90) 


The  phase  and  amplitude  are  given  by 

,  -~{xt2  -wot/a2  +  u0‘tx/a2  ,  i _ 27* 

«*’ 0  ' - 4[(^  +  7V] - +  5  arC‘“  a 


K'(  u>)  =  27a;  =  - 
x 


we  have 


-TxaV-x.,t  +  Wx  +  1  arctan  ?£ 
1  +  4a272x2  a 


a  Therefore, 


(a/it)1^  (  1  \2 

|U(X,0I=  Vte  \a/4  +  72x2/  X 


1  f*2-4 
["  2“  V  T 


-  4a>o7x(f  -  a>o7x)  \ ' 
1  4-  4q272x2  / 


For  the  exact  instantaneous  frequency  we  have 
_  a>0  +  2a27xt 

W<(X,<)“  l  +  4a272x2  (85) 

This  is  a  chirp  even  though  a  pure  sine  wave  is  being 
generated  at  x  =  0.  In  fact,  even  for  u  =  0  we  have  a 
chirp. 

Example  4 ■  Consider 

u(0,i)  =  e-^a/2-M>t  (86) 


whose  spectrum  is 


=\/5  “pb 


(w  -  Up)2 
20 


The  solution  is 


uM  =  ]lr^c  x 


t  x  (q/tt)1/4  r  (t  —  2'yuox)2 

v  27xy  L  872x2q: 

Using  Eq.  (59)  we  have  that 

(a/7r)1/4  nr 

u(x,t)  =  /riL —  x 

v  '  V2a  \  ix 


(t  -  2'yuox)2  .  t2  . 

exp  --  0-5-3——  -  *-3 - i7rsgn7/4 

87s  x2  a  47X 


This  gives  an  instantaneous  frequency  given  by 

(94) 

The  instantaneous  frequency  could  be  obtained  di¬ 
rectly  from  Eq.  (66).  For  this  case  we  have  that  t)  =  0 
and  hence 

Wj  =  w  =  ~  (95) 


5.  CONCLUSION 

We  have  given  simple  formulas  for  the  moments,  spread, 
and  instantaneous  frequency,  of  a  propagating  pulse  in 
dispersive  media. 


Therefore,  we  still  have  a  chirp  but  the  chirp  rate  changes 
with  distance. 


Example  5.  Now  consider  the  asymptotic  solution  for 
Example  4.  Solving  for  w  from 
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ABSTRACT 

We  propose  a  novel  set  of  wavelet-based  stochastic  mod¬ 
els  for  self-similar  network  traffic  with  non-Gaussian 
behaviors.  We  show  that  these  models  are  sufficiently 
accurate  and  parsimonious,  and  have  very  low  compu¬ 
tational  complexity  in  analysis  and  synthesis. 

1.  INTRODUCTION 

Recent  studies  of  high-quality,  high-resolution  network 
traffic  measurements  have  revealed  that  packet  traffic 
appears  to  be  both  self-similar  (or  long-range  depen¬ 
dent)  and  non-Gaussian  distributed  [1].  In  order  to 
realize  the  desirable  properties  of  communication  net¬ 
works,  such  as  ubiquity,  convenience,  affordability,  re¬ 
liability,  and  security,  it  is  crucial  to  develop  accurate 
and  efficient  traffic  models  that  are  capable  of  yielding 
acceptably  precise  performance  predictions  in  a  reason¬ 
able  amount  of  time. 

A  real- valued  stochastic  process  X(t)  is  said  to  be 
statistically  self-similar  with  parameter  H  if  for  any 
a  >  0, 

X(t)  =  a~H  X(at)  (1) 

where  the  equality  holds  in  a  statistical  sense  (e.g.,  in 
all  finite-dimensional  joint  distributions  or  in  second- 
order  statistics)  and  H  is  the  so-called  Hurst  param¬ 
eter,  which  satisfies  0  <  H  <  1  and  captures  the  de¬ 
gree  of  self-similarity.  In  the  context  of  network  traffic, 
which  are  typically  modeled  as  non-negative  processes, 
the  Hurst  parameter  can  be  used  as  a  measure  of  bursti- 
ness  [2].  The  time-averaged  spectrum  of  X(t),  denoted 
by  Sx (w),  exhibits  a  1//  behavior: 


The  multiscale  property  of  wavelets  [3],  [4]  makes 
wavelet  representations  to  be  natural  and  powerful  anal¬ 
ysis  and  synthesis  tools  for  self-similar  network  traffic. 
A  wavelet-domain  independent  Gaussian  model  is  pro¬ 
posed  in  [5].  A  wavelet-based  multi-fractal  model  is 
proposed  in  [6]. 

In  this  paper,  we  propose  a  novel  set  of  wavelet- 
based  stochastic  models  for  the  emerging  complex  high¬ 
speed  packet  network  traffic  with  self-similar  and  non- 
Gaussian  behaviors.  We  show  that  these  models  are 
sufficiently  accurate  and  parsimonious,  and  have  very 
low  computational  complexity  in  analysis  and  synthe¬ 
sis. 

The  following  convention  of  notation  is  used  in  the 
paper: 

OO 

I>  £  ■  m 

k  k=  —  oo 

2.  WAVELET-BASED  SYNTHESIS  OF 
NON-NEGATIVE  SELF-SIMILAR 
PROCESSES 

2.1.  Wavelet-Based  Models 

Let  if  and  <f>  be  the  synthesis  wavelet  and  scaling  func¬ 
tion  of  a  two-channel,  compactly  supported,  real-valued, 
biorthogonal  wavelet  system  [3],  respectively.  We  con¬ 
struct  a  random  process  X(t)  by  means  of  a  biased 
wavelet  series  expansion: 

*(*)  =  ££  Wiik4>iik(t)  +  fi  (4) 

i  k 


Ci 


u> 


\2H~1 


<  Sx(a>)  < 


Cu 


\UJ 


2tf-l 


(2) 


where  Ci  and  Cu  are  constants  satisfying  0  <  Ci  < 
Cu  <  oo. 


This  work  was  supported  by  Defense  Advanced  Research 
Project  Agency  under  grant  F30602-00-2-0501. 


where  we  have  used  the  short-hand  notation 

V’i.fcW  =  2i/2^(2  H-k)  (5) 

for  the  dilated  and  translated  versions  of  ip{t),  and 
hereafter  we  shall  apply  the  notation  to  </>  similarly. 
The  wavelet  coefficients  { W, :  k  e  Z}  at  scale  2l 


0-7803-5988-7/00/$  10.00  ©  2000  TF.F.F. 


490 


are  independent,  identically  distributed  (i.i.d.)  random 
variables  with  zero  mean  and  variance 

E\Wlk)  =  <j\  =2-i{2H~1)<72  (6) 

where  a1  is  a  reference  variance.  The  constant  //  is  used 
to  represent  the  desired  mean  of  the  process.  According 
to  [4,  Theorem  3.4],  X(t)  is  a  1//  process. 

However,  it  is  impractical  to  synthesize  the  random 
process  X  ( t )  using  (4)  due  to  that  infinitely  many  scales 
are  required.  Therefore,  the  method  suggested  by  the 
theorem  is  not  useful  in  practice. 

We  propose  to  synthesize  a  process  Xi(t)  using  a 
finite  number  of  scales  in  the  wavelet  series  expansion: 

Xi(t)  =  4>i,k{t)  +  M  (?) 

k 

I- 1 

=  &o,fc(*) +  EE  Wit  +  M 

k  i=i  o  k 

(8) 

=  +  M  (Q) 

i=—oo  k 

where  {Siik  :  k  €  Z}  are  the  unbiased  scaling  coeffi¬ 
cients  at  scale  2\  In  order  to  obtain  non-negative  pro¬ 
cesses,  we  choose  to  use  biorthogonal  J9-spline  wavelets 
[3]  whose  synthesis  scaling  functions  are  non-negative. 
By  comparing  (4)  and  (9),  we  obtain 

lim  Xr(t)  =  X(t)  (10) 

I-y  oo 

which  implies  that  Xi(t)  is  an  asymptotically  1//  pro¬ 
cess. 

We  use  an  iterative  procedure  to  synthesize  Xi(t) 
according  to  (8): 

Step  1:  set  i  :=  to; 

Step  2:  synthesize  { Sz[k }  :  k  €  Z},  the  scaling  coeffi¬ 
cients  at  scale  2Z; 

Step  3:  synthesize  {Wi[k\  :  k  £  Z},  the  wavelet  coeffi¬ 
cients  at  scale  2%  from  {5,[A:]  :  k  €  Z}; 

Step  4:  synthesize  {5i+i[fc]  :  k  €  Z},  the  scaling  coef¬ 
ficients  at  scale  2l+1,  from  {S,[/c]  :  k  €  Z}  and 
{Wi[k]  :  fc  e  Z}  by  means  of  the  Mallat  synthesis 
algorithm  [7]: 

Sj+i,fc  =  y~^{h[k  -  2/]  5i,i  +  g[k  -  21]  Wi)()  (11) 

i 

where  h[k]  and  g[k]  are  the  FIR  synthesis  filters 
of  the  wavelet  system; 


Step  5:  if  i  <  I  -  1,  then  set  i  :=  i  +  1  and  go  to  Step 
2;  otherwise  stop. 

Since  it  is  possible  to  choose  from  various  wavelet 
systems  for  synthesis  and  various  densities  for  the  scal¬ 
ing  coefficients  at  the  coarsest  scale,  we  obtain  a  rich 
set  of  wavelet-based  stochastic  models. 

2.2.  Synthesis  of  Scaling  Coefficients 

In  order  to  synthesize  the  scaling  coefficients,  we  derive 
the  second-order  statistics  of  the  the  scaling  coefficients 
at  any  scale. 

According  to  (27),  we  infer  that 

siik  =  £  y,  w  -  2h^h  ~ 

fa  fa 

+  EE/ltfc_  2lMi  -  2 Z2]  Wi-2,h 

h  fa 

+  £fl[fc-2l1]Wi_lji1  (12) 

h 

=  EEE  h[k  -  2h]h[h  -  2 ia] 

li  h  h 

xh[l2  -  2/3]  Si-z, l3 

-•-EEE^-2*1^1-2^ 

1 1  h  h 

Xg[h  ~  2/3]  Wi-3)h 

+  ^  -  2li]g[h  -  2 12]  Wi_2,(2 

(j  (2 

+  ^ff[fc-211]Wi_Ml  (13) 

fa 

OO  _  /  n~~l  \ 

=  e  e  e "  ■  e  ( n  ) 

n=2  1 1  fa  ln  \m= 2  / 

x/i[A;  -  21i]^[l„_i  -  21„]  Wi-n,in 

+  Y,9[k-2h]Wi-lM.  (14) 

fa 

Thus,  it  follows  that 

OO 

E[Si,,Sitk}  = 

n= 2  / 1  fa  ln 

xh[k  -  2/i]  f  Yl  h[lm- 1  -  2lm] ) 

\m=2  / 

1  2/n] 

(15) 

fa 

In  our  models,  we  choose  to  use  biorthogonal  B- 
spline  wavelets  whose  synthesis  filters  are  half-point 
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symmetric  and  hence  satisfy 

=  Y,tf[2i+i}=1-,  (i6) 

l  l  1 

I>2[2/]  =  Z>2[  2l  +  1\  =  l-  (17) 

i  i  z 

Therefore,  it  follows  that 

OO 

=  E  2l~U°ln  E  ^  ~  2llMk  ~  2/l] 

n—2  lx 

+^-iE»[/-2|i]#-2'i]  (18) 


2  iH 


-3 


1  _  22ff-2Z  G 

x  E  h[l  —  2li]h[k  —  2li] 
h 

+22//-i  .  2-i(27?-l)CT2 

x  EffP-2*iM*-2Ii]. 


Since 


(19) 

(20) 


E[Si+2m,lSi+2m,k]  = 
for  any  integer  m,  the  process  {Sitk  :HZ}  is  wide- 
sense  cyclostationary  with  period  2. 

Example.  If  the  Haar  wavelet  is  used,  i.e., 


h[n]  =  -^{8n  +  6n- 1) 
9[n\  =  —  ^n~  i)> 


(21) 

(22) 


then  the  process  {Sj.fc  :  k  €  Z}  becomes  wide-sense 
stationary  due  to  that 


'  ^v2 if/  =  fc 


£[Si,/S<lfc]  =  7 


l_22H-2 
c\AH  —  3  r\“2H  —  2  - / n  rr  \  0 

2___^_2-l(2«-l)(72  if|/_jfc|  =  1 

0  otherwise. 

(23) 


2.3.  Synthesis  of  Wavelet  Coefficients 

The  synthesized  process  Xj(t)  can  be  expressed  as 
Xi(t)  =  J2S'i,kMt)  (24) 

k 

I- 1 

=  E  sLk  4>io  .*(*)  +  E  E  Wi’k  V-aW  (25) 

k  i=i„  k 

where  the  biased  scaling  coefficients  are  given  by 

‘S’l+i,*  =  &i+  i,fc  +  2  lJ+1l/2/i  (26) 

=  EWfc-2*]5M+3[fc-2/]WM)  (27) 


To  synthesize  a  non-negative  process  X/(t),  we  need  to 
maintain  the  non-negativity  of  the  biased  scaling  coef¬ 
ficients  {S'i  k  :  to  <  i  <  I,  k  €  Z}.  To  achieve  this  goal, 
we  first  choose  S'io  k  to  be  non-negatively  distributed. 
The  probability  density  function  (PDF)  of  S'  k  can  be 
log-normal,  Rayleigh,  Maxwell,  gamma,  etc.  Secondly, 
we  use  a  multiplicative  model  for  synthesizing  wavelet 
coefficients  as 

Witk  =  Ai:kS'ik  (28) 

where  {Ai<k  :  k  E  Z}  are  zero-mean,  i.i.d.  random 
variables  for  a  fixed  i,  and  Al%k  is  independent  of  S'  k 
for  any  i  and  k. 

We  assume  that  the  synthesis  filters  satisfy 

•  the  support  of  g[n]  is  a  subset  of  the  support  of 

%]; 

•  h[n]  is  non-negative,  i.e.,  h[n]  >  0 ,Vn. 

The  biorthogonal  71-spline  wavelets  (including  the  Haar 
wavelet),  whose  two  synthesis  filters  have  the  same  even 
length,  satisfy  both  conditions.  Define 


Ch  g  -  min 

r  i  /, 


h[n ] 


******  .  -  z-r  . 

sW#o  |g[n]| 


(29) 


In  our  models,  the  wavelet  coefficients  and  the  scaling 
coefficients  satisfy 


\Ai,k\<Ch,g  Vi,  k.  (30) 

Using  (27),  we  infer  that 

S'i+i,k  >  E(M*  -  2/]  S'itl  -  | g[k  -  21}  Wi.il)  (31) 

i 

'  *  E(fc[*  -  2/l  Sh  ~  1 9[k  -  2l]\Ch,gS'itl)  (32) 

i 

>  0.  (33) 

The  variance  of  the  wavelet  coefficient  W)  k  is  given 
by 

E[Wlk\  =  E[Alk]E[Slk}.  (34) 

In  our  models,  the  PDF  of  Aitk  is  chosen  to  be  a  mix¬ 
ture  of  two  symmetric  beta  PDFs: 


PA{.k  (a)  =  A ip(a;  q<t)  +  (1  -  A <)p(a;  qifl)  (35) 

where  0  <  Ai  <  1  and  p(a;  q)  denotes  the  symmetric 
beta  PDF  with  a  shape  factor  q  >  0,  i.e., 


*;«)  =  ! 


o 


q-1 


if  |o|  <  Ch>g 
otherwise 


p(a 

with  the  constant 

Cq  =  22*-1  [C''  g  [x{CKg  -  x)]q~1dx. 
Jo 


(36) 


(37) 
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Since  the  variance  associated  with  the  PDF  p{a\  q)  is 
Cl  g/(2q  +  1),  the  variance  of  Ai>k  is  given  by 


E[A\k} 


C2 


+ 


(1  -  A<) 


[2^,1  +  1  2qifi  +  1 


(38) 


3.  FITTING  THE  MODELS  TO  DATA 

For  a  given  training  data  set,  we  determine  the  parame¬ 
ters  of  the  proposed  models  from  the  empirical  wavelet 
coefficients  {W^k  :  io  <  *  <  I,  k  £  Z}  and  scaling 
coefficients  {S;0ifc  :  k  £  Zj. 

3.1.  Least-Squares  Estimation  of  H  and  {of} 

We  first  compute  of,  the  variance  of  the  empirical  wavelet 
coefficients  at  scale  2'  for  io  <  i  <  I.  Then,  we  use 
the  least-squares  criterion  to  estimate  H  and  a2  from 
{of  :  io  <  i  <  1}  according  to  (6). 


3.2.  Maximum  Likelihood  Estimation  of  the 
PDFs  {p^,fc(a)} 

In  order  to  synthesize  the  wavelet  coefficients  at  a  fixed 
scale  2%  we  need  to  determine  the  PDF  {pAi<k  (a)}.  Due 
to  (38),  the  mixing  parameter  A,  can  be  expressed  in 
terms  of  the  desired  variances  of  Wi,k  and  S'tk: 


(2<7i,i  +  1)(2?»,2  +  1)  b[(S'  fc)2]Cjj  ^  (29i,i  +  1) 

~  2(gi)2  -  qi, i) 

(39) 

We  use  a  maximum  likelihood  criterion  to  estimate 
the  shape  parameters  <7;,i  and  qlt2  from  the  empirical 
wavelet  coefficients  and  scaling  coefficients  at  scale  2b 


max 


III*,. 


(40) 


4.  SIMULATIONS 

In  our  simulations,  we  use  a  measured  traffic  trace  from 
the  Bellcore  ftp  site  [1].  We  choose  the  Haar  wavelet 
and  model  the  biased  scaling  coefficients  at  scale  2l° 
using  the  Rayleigh  density. 

Figure  1(a)  and  1(b)  depict  a  segment  of  the  mea¬ 
sured  traffic  data  and  a  segment  of  the  synthesized  traf¬ 
fic  data,  respectively.  Figure  1(c)  and  1(d)  illustrate 
the  histograms  of  the  measured  trace  and  the  synthe¬ 
sized  trace,  respectively.  Figure  2(a)  and  2(b)  plot  the 
autocovariance  functions  of  the  measured  trace  and  the 
synthesized  trace,  respectively.  Figure  2(c)  and  2(d) 
plot  the  power  spectra  of  the  measured  trace  and  the 


synthesized  trace,  respectively.  These  figures  demon¬ 
strate  that  the  statistics  of  the  measured  traffic  and 
the  synthesized  traffic  are  very  close. 

Figure  2(e)  plots  the  probabilities  of  buffer  overflow 
versus  buffer  size  for  a  single-server  queue  fed  with  the 
measured  trace  and  the  synthesized  trace.  The  figure 
shows  that  the  queuing  behaviors  for  the  two  traces  are 
very  similar. 


5.  CONCLUSION 

We  have  presented  a  set  of  wavelet-based  stochastic 
models  for  1//  network  traffic.  Besides  the  accuracy 
shown  in  our  simulations,  our  models  possess  the  fol¬ 
lowing  features: 

•  parsimony:  the  model  parameters  include  the  pa¬ 
rameters  of  the  PDF  of  the  scaling  coefficients  at 
scale  2l°,  H,  a 2,  and  {<?i,i,  <?i,2  :  io  <  *  <  -0; 

•  computational  efficiency:  wavelet  analysis  and 
synthesis  have  low  computational  complexity. 

Therefore,  the  proposed  models  are  very  promising  in 
network  traffic  engineering. 
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Figure  1:  (a)  A  segment  of  the  measured  trace;  (b)  a 
segment  of  the  synthesized  trace;  (c)  histogram  of  the 
measured  trace;  (d)  histogram  of  the  synthesized  trace. 


Figure  2:  (a)  Autocovariance  function  of  the  measured 
trace;  (b)  autocovariance  function  of  the  synthesized 
trace;  (c)  power  spectrum  of  the  measured  trace;  (d) 
power  spectrum  of  the  synthesized  trace;  (e)  queuing 
behaviors  of  the  two  traces. 


THE  EXTENDED  ON/OFF  PROCESS  FOR  MODELING  TRAFFIC  IN 
HIGH-SPEED  COMMUNICATION  NETWORKS 

Xueshi  Yang,  Athina  P.  Petropulu  and  Vaughn  Adams 

Electrical  and  Computer  Engineering  Department, 

Drexel  University,  Philadelphia,  PA  19104,  USA 
Tel.  (215)  895-2358  Fax.  (215)  895-1695 


ABSTRACT 

High-speed  network  traffic  is  impulsive  and  exhibits  long- 
range  dependence.  While  the  latter  characteristic  has  been 
studied  extensively,  the  former  has  received  much  less  atten¬ 
tion.  The  On/Off  model  is  a  well  known  model  for  capturing 
the  long-range  dependence  of  traffic  traffic.  In  this  paper 
we  propose  an  extension  to  the  On/Off  model,  which  allows 
the  model  to  also  capture  the  traffic  impulsiveness.  We  pro¬ 
vide  queuing  analysis  of  the  proposed  model,  which  along 
with  numerical  results  suggests  that  the  traffic  marginal 
distribution  may  have  significant  impact  on  networking  en¬ 
gineering. 


In  this  paper  we  propose  the  Extended  On/Off  pro¬ 
cess,  as  a  way  to  overcome  the  limitation  of  the  traditional 
On/Off  model.  Each  user  transmits  or  stays  idle,  with  dura¬ 
tions  that  are  heavy-tail  distributed,  but,  unlike  the  AFRP 
model,  the  bandwidth  requirement  during  the  transmission 
state  is  a  heavy-tailed.  We  provide  proofs  for  long-range  de¬ 
pendence  and  heavy-tail  properties  of  the  proposed  model 
for  single  user  traffic  and  also  for  aggregated  traffic.  We 
provide  analytical  results  on  the  queuing  behavior  of  the 
proposed  model,  which  indicate  that  the  heavy-tailed  re¬ 
ward  process  may  affect  queuing  performance  as  much  as 
the  self-similar  characteristics  of  the  traffic  flow.  We  also 
provide  results  based  on  real  traffic  to  demonstrate  the  va¬ 
lidity  of  our  theoretical  claims. 


1.  INTRODUCTION 


2.  MATHEMATICAL  PRELIMINARIES 


Extensive  studies  indicate  that  traffic  in  high-speed  commu¬ 
nication  networks  has  self-similar  [14],  [4]  and  long-tailed 
characteristics  [7],  [6],  [10].  There  are  many  studies  deal¬ 
ing  with  the  self-similarity  characteristic,  the  best  known 
of  which  is  the  On/Off  model  [14].  In  data  communica¬ 
tion  networks,  the  packets  are  communicated  in  a  “packet 
train”  fashion;  once  a  “packet  train”  is  triggered,  the  prob¬ 
ability  that  another  packet  will  follow  the  current  one  is 
very  large.  The  On/Off  model  is  based  on  that  packet  train 
idea.  A  single  source/destination  active  pair  alternates  be¬ 
tween  two  states:  the  On,  during  which,  there  is  data  flow 
between  source  and  destination,  along  either  way,  and  the 
Off,  which  is  the  quiet  duration.  Both  the  On  and  Off 
durations  follow  a  heavy-tail  distribution.  For  heavy-tail 
phenomena  the  probability  of  large  values  decays  hyper- 
bolically  instead  of  exponentially.  The  self-similar  charac¬ 
teristics  of  the  AFRP  have  been  attributed  to  the  heavy- 
tail  properties  of  the  On/Off  states  durations.  However, 
in  the  seminal  paper  of  [13]  it  was  shown  that  the  cumula¬ 
tive  superposition  of  infinite  AFRP’s  is  fractional  Brownian 
motion,  which  is  Gaussian.  This  fact  renders  the  superpo¬ 
sition  of  AFRPs  inconsistent  with  real  traffic  data,  which 
is  clearly  non-Gaussian.  In  fact,  the  marginal  distribution 
of  a  traffic  flow  can  have  a  profound  impact  on  network 
engineering,  for  example,  it  can  significantly  change  queu¬ 
ing  performance  and  buffer  overflow  probability  [5].  In  [5], 
it  is  shown  that  under  different  marginal  distributions  of 
the  traffic  streams,  the  packet  loss  rates  differed  by  several 
orders  of  magnitude. 


This  work  was  supported  by  National  Science  Foundation 
under  grant  MIP-9553227 


A  random  variable  is  called  regularly  varying  with  index  a, 
to  be  denoted  by  X  €  7?.„,  if  for  all  k  >  0, 

lim  Fx(kt)/Fx{t)  =  fc“.  (1) 

t—too 

A  random  variable  with  regularly  varying  distribution  func¬ 
tion  is  also  referred  to  as  heavy-tail  distributed.  The  Pareto 
distribution  is  the  simplest  example  of  heavy-tailed  distri¬ 
bution.  Its  survival  function  is  given  as: 


Fx{x)  = 


{ 


(!)“, 

i, 


x  >  k, 
x  <  k, 


(2) 


where  k  is  positive  constant. 

A  random  process  x(t),  with  finite  second  order  statis¬ 
tics,  is  called  stationary  process  with  long  memory  [1],  or 
long-range  dependence  in  the  autocovariance  sense,  if  its 
autocovariance  function  decays  hyperbolically  as  the  lag  k 
increases. 

For  processes  which  might  not  have  second-order  statis¬ 
tics,  a  structure  measure  different  that  the  autocorrelation 
is  needed.  We  will  use  the  quantity  defined  in  [12]  i.e., 


I{jn,P2\T)  =  -In  E{e'^t+T)+P2xW)} 

+  In  E{e,pix{t+T)}  +  In  E{e'P2x{t)}  (3) 

and  will  be  referring  to  the  above  quantity  as  the  generalized 
codifference. 

We  will  say  that  the  stationary  process  X(t)  is  a  long- 
memory  process  in  a  generalized  sense  if 

lim  — 7(1,  — 1;t)/t^-1  =  c  (4) 

T  — k  oo 
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where  c  is  some  positive  constant  and  (3  <  1.  This  definition 
of  long-range  dependence  has  been  used  for  the  first  time  in 
[11]  to  study  the  joint  statistics  of  the  power-law  shot  noise 
process. 


where  =  represents  equality  in  distribution,  and 

Qe  =  lim  Q(t„),  (9) 

n — ►  oo 


2.1.  The  AFRP 

The  Alternating  Fractal  Renewal  Process  (AFRP),  proposed 
in  [14]  for  modeling  of  network  traffic,  is  a  process  that  al¬ 
ternates  between  two  states,  0  or  1.  The  time  {Xn},  spent 
in  state  1,  is  a  random  variable  with  density  function  /i(t), 
and  the  time  {Vn}.  spent  in  state  0,  is  a  random  variable 
with  pdf  of  fo(t),  where  /i(t),  /o(f)  obey  heavy-tailed  dis¬ 
tributions,  i.e. 

fi(t)  ~  where  *  =  0, 1,  tug  (1,2)  (5) 

Generally  speaking,  /0(t)  =  /i(t)  =  0  for  t  <  0,  and  the 
associated  dwell  mean  times  pi  :=  E[X„]  and  p0  :=  F[Yn] 
are  finite.  The  expected  value  of  the  AFRP  process  X(t )  is 
M i/(/*o  +  Pi)-  The  power  spectral  density  of  AFRP  equals 

[9], 


S(w)  =  E{A-(i)}«5(o;/27r) 


+ 


2ui~ 


po  +  pi 


Re 


{ 


[l  -  Qo(-jtu)]  [l  -  Qi(-ju)]  1 
1  -  <?o(-jw)Qi(-jw)  J 


(6) 


where  Q0(~juj)t  Qi(-jto)  are  the  Fourier  transforms  of 
/o(t),  and  fi(t)  respectively. 


where  {tn  =  Di+Ui3 Dn  +  Un},  n  >  1.  The  following 

theorem  links  the  steady  state  buffer  content  distribution 
at  continuous  time  t,  t  >  0  to  that  at  time  points  t„,  n  >  1 
[8]= 

Theorem  2  For  traffic  intensity  p  :=  F[Ri(.Di)]/rF[f/i]  < 
1,  the  steady  state  buffer  content  distribution  satisfies: 

P[Q(co)>q]  =  -?L-pP[Qe  +  Rl(Dly  >q] 

+^TdPtQe  +  i?l(Dl,)>9]'  (10) 

where  u  =  F[[/i]  and  d  =  E[Di],  with 

F[S,(D,r  >  »)  =  J~  P[R,(D,)  >  .]*, 


P{R\(D\)  >  q\  —  -  J  1[r, (.)>,]<&,  (12) 

where  l[.j  is  the  indication  function.  Both  Ri(Di)*  and 
Ri(Dl)  are  independent  of  Qe . 


2.2.  Related  queuing  results 

We  next  summarize  some  queuing  analysis  results  needed 
in  the  analysis  of  the  proposed  model. 

Consider  a  GI/G/1  queue,  and  let  wn  represent  the  ac¬ 
tual  waiting  time  of  the  nth  arriving  customer,  which  has 
a  service  time  of  r„.  Let  W{-)  and  B{.)  be  the  distribution 
functions  of  wn  and  r„,  respectively.  Also,  let  crn+1  be  the 
interarrival  time  between  the  nth  and  n  +  1th  customer, 
and  A(-)  be  its  distribution.  In  [3]  it  was  shown  that  the 
distribution  function  of  the  stationary  actual  waiting  time 
has  a  regularly  varying  tail  if  and  only  if  the  tail  of  the  ser¬ 
vice  time  distribution  varies  regularly  at  infinity.  In  a  form 
of  a  theorem,  it  was  shown  that  [3]: 

Theorem  1  For  GI/G/1  queuing  sytem  with  traffic  inten¬ 
sity  p  =  b/a  <  1,  as  t  — t  oo  it  holds: 

B{t)  =  k(b/t)k+1L{t)  4=>  W(t)  =  p(l  -  p)-l(b/t)hL(t ), 

(7) 

where  B(t)  =  1  -  B(t);  W{t)  =  1  -  W{t);  a,b  denote  the 
mean  of  r„  and  cr„  respectively;  k  >  0  and  L(t)  is  a  slowly 
varying  function. 

In  [8],  Kella  et  al  studied  a  storage  model  with  a  two- 
state  random  environment,  that  alternates  between  down 
and  up  states.  During  down  times  {Dk  :  k  >  1},  there  is 
net  flow  into  the  buffer  according  to  a  stochastic  process, 
{{Rfc(t)  :  t  >  0}  :  k  >  1},  and  during  up  times  {Uh  :  k  > 
1},  there  is  a  flow  out  of  the  buffer  at  rate  r.  Let  Q(t),  t  >  0 
denote  the  buffer  content  process  at  time  t,  and  let 

Q{ oo)  =  lim  Q(t), 

t—y  oo 


3.  THE  EXTENDED  AFRP  (EAFRP) 

In  [13],  it  was  shown  that  the  aggregated  cumulative  version 
of  many  homogeneous  or  heterogeneous  AFRP  processes, 
is  fractional  Brownian  motion,  the  only  Gaussian  process 
with  stationary  increments  that  is  self-similar.  However, 
the  fact  that  the  aggregated  sum  of  AFRP  is  Gaussian  is 
not  consistent  with  the  heavy-tail  properties  of  high-speed 
network  traffic,  the  tail  index  a  of  which  deviates  far  away 
from  2. 

As  a  simple  way  to  introduce  impulsiveness  in  the  over¬ 
all  traffic  model,  we  here  propose  to  treat  the  single-user 
bit  rate  as  a  random  variable  with  heavy-tailed  character¬ 
istics.  Let  us  define  the  extended  AFRP  (EAFRP)  process 
as  follows: 

(i)  The  On-periods  {Xn},  and  the  Off-periods  {Yn}  are 
i.i.d.,  independent  of  each  other  with  distributions  respec¬ 
tively  Fi  and  Fo,  and  have  finite  mean  p\  and  po,  respec¬ 
tively;  (ii)  The  transmitting  rates  { An }  during  different  on- 
periods  are  i.i.d.  random  variables  with  distribution  func¬ 
tion  Fa,  independent  of  {!„}  and  {Tn},  and  have  finite 
mean  pa  (iii)  Fj,  Fo  and  Fa  are  Pareto  distributed,  with 
tail  indices  respectively  1  <  au,  a0,  cxa  <  2,  and  parame¬ 
ters  fci,  ko,  kA  >  0  respectively. 

For  a  single  AFRP  it  was  shown  in  [9],  that  in  the  in¬ 
termedium  frequency  range,  the  power  spectrum  follows  a 
power-law  function.  We  here  examine  the  single  AFRP  in 
the  frequency  range  around  the  origin,  and  show  a  result 
similar  to  that  of  [9],  i.e., 

Proposition  1  An  AFRP  with  On  and  Off  periods  Pareto 
distributed  with  tail  indices  and  oo,  respectively,  is  long 
memory  in  the  autocovariance  sense. 


(8) 


496 


The  proof  can  be  found  in  [16]. 


Let  us  study  the  buffer  content  at  time  points: 


As  the  EAFRP  is  constructed  based  on  the  AFEP,  it 
should  also  exhibit  some  long-range  dependence.  However, 
by  letting  the  reward  be  heavy-tailed,  the  second  order 
statistics  are  infinite.  Thus,  the  long-range  dependence  of 
EAFRP  will  be  studied  in  the  generalized  sense  of  (3). 

Proposition  2  Let  E(t )  be  an  EAFRP  as  defined  above. 

a)  For  a  fixed  t,  E(t)  is  a  heavy-tail  random  variable 
with  tail  index  a  a  • 

b )  E(t)  exhibits  long-range  dependence  in  the  generalized 
sense,  i.e., 

-J(l,-l;r)  ~cr1-m<n{“,,“0\  OO  (13) 


{tn:=^2Xi+Yi,  n=  1,2,...}  (16) 

»=i 

At  those  points  the  Q(t)  satisfies  the  recursive  equation: 
Qn+ 1  =  [Qn  +  {An  ~  r)Xn  —  rYn]+ ,  n  =  1,2, ...,  (17) 

where  Qn  :=  Q(tn),  [-]+  :=  maa:[0,  •].  We  will  assume  that 
the  fluid  source  begins  with  an  on  period,  with  empty  buffer 
content  at  time  zero,  i.e.,  Qo  =  0. 

The  net  input  Bn  during  an  on  session  is 

Bn  =  (An-r)X „,  n  =  1, 2, ...  (18) 


Proof:  see  Appendix  A. 

The  overall  network  traffic  consists  of  the  superposition 
of  many  single  source/destination  pairs.  Thus,  the  pro¬ 
posed  model  for  the  overall  traffic  is  the  superposition  of 
EAFRP’s. 

Proposition  3  Let  SE(t)  be  the  superposition  of  M  in¬ 
dependent  EAFRP’s  Em{t),m  =  1, ...,  M ,  with  parameters 
denoted  by  a  subscription  e.g.  au,ku,aoi,koi,aAi,kAi- 

a)  For  some  fixed  t,  SE(t)  is  a  heavy-tail  random  vari¬ 
able  with  tail  index  min{aAi,oiA2,  ...oiam). 

b)  SEM(t)  has  long-range  dependence  in  the  generalized 
sense 


Given  that  An  and  X„  are  heavy-tail  distributed,  and  for 
r  <  Ka,  Bn  can  be  shown  [16]  to  be  heavy-tail  distributed 
with  tail  index  —  min{aA,  «i). 

The  queue  length  satisfies  the  same  recursive  equation 
as  the  successive  waiting  times  in  a  GI/G/1  queue  with 
service  times  {(A„  -  r)X„}  and  inter-arrival  times  {rYn}, 
n  =  1,  2, ....  Thus,  applying  (7)  with  a  =  c*a  A  «i,  N{t)  = 
Ci,  fi  =  rp0,  a  =  {p,A-  r)m,  as  q  -1  oo  we  get: 


Cirpo 


P[Qe  >  6]  r(l0[aA  Aqi-  1)[(j/a  -  r)pi  -  rpo] 


1-1) 
(19) 


Linking  the  On  and  Off  periods  to  the  down  and  up 
states  of  [8]  we  can  apply  Theorem  2  to  get  the  the  steady 
state  queue  length  distribution  as: 


Proof:  Can  be  found  in  [16]. 

4.  QUEUING  ANALYSIS  OF  THE  EAFRP 
MODEL 

Let  us  Consider  an  EAFRP  process  feeding  a  stable  queue. 
During  the  On  state  fluid  enters  in  the  queue,  while  during 
both  the  On  and  Off  states  fluid  leaves  the  queue  at  constant 
rate  r.  For  a  stable  queue,  r  is  larger  then  the  mean  in-flow 
rate,  i.e. 

r  >  -M-  (14) 

pi  +  Po 

The  buffer  content,  or  queue  length  Q(£),  is  a  continuous¬ 
time  stationary  stochastic  process.  The  case  where  An  = 
constant  >  r,  and  Yn  exponentially  distributed  has  been 
extensively  treated  [2].  For  the  case  of  heavy-tailed  An  we 
propose  the  following  result. 

Proposition  4  The  steady  state  queue  length  of  an  EAFRP 
queue  is  heavy-tail  distributed  with  tail  index  1  —  oia  A  ai , 
i.e. 

P[Q{oo)  >  q)  ~  Cq1~(aAAai),  as  q  oo,  (15) 
where  C  is  some  constant  independent  of  q 

Proof:  In  our  queuing  analysis  we  will  employ  the  tra¬ 
ditional  methodology,  where  the  distribution  of  the  buffer 
content,  or  the  queue  length,  is  first  computed  at  discrete 
time  points  and  then  the  stationary  distribution  is  derived. 


P[Q(oo)  >  q)  —  -  ^ — P[Qc  +  R\{X{)  >  q\ 

pi  +  po 

+  —!T—pP[Qe  +  Rl{Xi)*>q} 

Pi  +P0 

with  traffic  intensity  p  =  {pa  —  r)pi/rpo,  and 

i  rXi 

P[Ri(X D  >  x]  =  -^E  j '  luM-rp^dt  (20) 

and 

p[«i(*r > *]  =  E^i-rjx; /  Pl{Al ~ r)Xl > t]du 

(21) 

where  Ri(Ai)*  and  Ri(Xt)  are  independent  of  Qr. 

It  can  be  shown  that  P[Ri(A'i)'*  >  x]  ~  x1 
and,  P[Ri{Xt)  >  x]  ~  x1-ai,  as  x  oo,.  Combining 
(19)  yields  that  the  stationary  queue  length  distribution  is 
heavy-tail  distributed  with  tail  index  \  —  a\  Aoia-  G 

So,  driven  by  a  single  EAFRP  source,  when  the  marginal 
distribution  of  the  transmitting  rates  has  a  heavier  tail  than 
the  on  periods,  the  buffer  content  distribution  will  be  signif¬ 
icantly  changed,  namely  the  asymptotic  tail  index.  Further¬ 
more,  from  actual  high-speed  LAN  traffic  measurement,  it 
is  observed  that  in  most  cases,  the  transmitting  rates’  tails 
are  much  heavier  than  the  on-periods’.  It  implies  that,  in 
such  cases,  the  asymptotic  queue  length  behavior  is  deter¬ 
mined  by  the  marginal  distribution,  instead  of  that  of  on 
periods. 
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5.  EXPERIMENTS 

In  this  section,  we  first  validate  our  claim  that  in  actual 
single  user  traffic  the  transmitting  rates  are  heavy-tailed 
distributed.  We  then  demonstrate  that  the  EAFRP  indeed 
can  model  traffic  in  high-speed  communication  networks. 
Finally  a  numerical  queuing  simulation  is  performed  to  val¬ 
idate  our  theoretical  finding  according  to  which  the  traffic 
marginal  distribution  can  affect  the  queuing  performance 
as  much  as  the  traffic  long-range  dependence. 

Our  real  traffic  data  was  obtained  from  the  100-Mbps 
high-speed  Ethernet  network  at  the  Electrical  and  Com¬ 
puter  Engineering  Department,  Drexel  University.  It  con¬ 
tains  all  packets  transmit  t  ed  to  and  from  a  Unix  server  in 
3  continuous  days.  Out  of  the  total  traffic  we  separated 
the  flow  between  a  single  user  (hsm.ece.drexel.edu)  and  the 
server  (cbis.ece.drexel.edu),  which  occurred  on  May  19th 
2000  from  8:20AM  to  21:00PM.  The  EAFRP  was  formed 
by  firstly  choosing  an  appropriate  threshold  value.  If  no 
packets  were  transmitted  during  a  time  period  longer  than 
the  threshold  value,  that  period  was  considered  to  be  an 
Off  state.  The  Off  state  was  followed  by  an  On  state,  which 
started  as  soon  as  packet  activity  resumed.  The  transmit¬ 
ting  rate  during  each  On  period  was  calculated  by  averaging 
the  total  bytes  transmitted  during  that  period  over  the  pe¬ 
riod  duration. 

To  determine  the  presence  or  absence  of  the  heavy-tail 
effects,  the  most  commonly  used  methods  are  the  log-log 
complementary  distribution  (LLCD)  graph  and  the  Hill  es¬ 
timator  [14].  For  tth  =  0.1  sec.,  Fig.  1  depicts  the  Hill 
estimate  plot  of  the  transmitting  rates  during  different  on 
periods.  The  heavy-tailness  of  the  transmitting  rates  is  re¬ 
vealed  by  the  stable  Hill  estimator  plot.  The  same  plot  also 
shows  the  tail  index  to  be  close  to  1. 

Next,  we  proceed  to  use  EAFRP  model  to  synthesize 
the  traffic.  The  tail  indices  of  the  EAFRP,  aj,  a0  and  a  a 
were  obtained  through  the  Hill  estimator  applied  on  the 
real  data.  The  cutoffs,  i.e.  fcj,  fco  and  kA  were  set  to  be  the 
minimum  values  of  the  On/Off  durations  and  transmitting 
rates,  respectively.  These  parameters  were  found  to  be: 
t*i  =  1.5,  fci  =  1,  a0  =  1.2,  k0  =  10,  aA  =  1.0,  kA  =  30. 
Figures  2(a)  and  (b)  illustrate  the  actual  traffic  and  the 
synthesized  EAFRP.  We  observe  that  the  outlook  of  the 
two  traces  are  very  alike,  i.e.  they  are  very  impulsive.  To 
affirm  this  “visual  check” ,  we  plot  the  LLCD  of  the  both 
traces  in  (c)  and  (d)  respectively.  The  linearity  in  both  plots 
indicate  that  both  traces  are  indeed  heavy -tail  distributed. 
A  further  check  of  the  similarities  of  these  two  traces  is 
done  by  estimating  their  generalized  codifferences,  which 
are  shown  in  (e)  and  (f)  respectively.  It  is  obvious  that  both 
data  traces  exhibit  the  same  kind  of  long-range  dependence 
in  the  generalized  sense. 

In  the  following,  we  performed  a  simple  numerical  simu¬ 
lation  of  a  stable  queue  fed  by  an  EAFRP  process,  of  which 
the  tail  index  of  the  On  state  and  transmitting  rate  were 
1.5,  and  1.3  respectively.  Other  parameters  were  taken  to 
be  cko  =  1.3,  fci  =  ko  =  1,  kA  =  10.  The  server  service  rate 
was  set  to  46  corresponding  to  a  traffic  intensity  38%.  Based 
on  20  Monte  Carlo  simulations  of  time  length  10®  seconds 
we  estimated  the  complementary  mean  queue  length,  which 
is  shown  in  Fig.  3.  The  slope  of  a  line,  which  was  fitted 
to  the  queue  length  in  the  least-squares  sense,  was  found 
to  be  0.2603,  which  is  very  close  to  the  theoretical  value  of 


0.3.  As  expected  by  our  theoretical  results,  the  marginal 
distribution  becomes  the  dominant  factor  in  determining 
the  queuing  performance,  which  can  have  a  profound  effect 
in  self-similar  traffic  engineering. 

6.  CONCLUSIONS 


In  this  paper,  we  proposed  the  EAFRP  model  for  modeling 
single  user  traffic  in  high-speed  data  networks.  Both  theo¬ 
retical  and  simulations  indicate  that  the  EAFRP  model  is 
able  to  capture  the  impulsiveness  as  well  as  the  long-range 
dependence  of  traffic.  Our  model  can  be  easily  configured. 
It  has  only  6  parameters,  which  can  be  used  to  produce 
versatile  desired  traffic  flow  traces.  In  a  scaled  network 
environment  the  total  traffic  at  any  load  can  be  synthe¬ 
sized  as  the  superposition  of  EAFRPs,  where  the  number  of 
EAFRPs  corresponds  to  the  active  source/destination  pairs 
in  the  whole  network.  The  EAFRP  model  revealed  an  in¬ 
triguing  result  in  traffic  engineering.  Contrary  to  what  has 
been  assumed  so  far,  the  Hurst  parameter  is  not  the  only 
factor  in  determining  buffer  dimensioning  and  loss-rate  es¬ 
timation.  Both  our  analytical  queuing  results  and  exper¬ 
iments  indicated  that  the  traffic  marginal  distribution  is 
equally  important  to  the  self-similarity,  which  in  turn  sug¬ 
gests  that  the  marginal  distribution  should  be  taken  into 
account  in  network  infrastructure  design. 
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Figure  1:  The  Hill  estimator  the  transmitting  rates  of  actual 
single  user  traffic. 
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Figure  3:  Complementary  queue  length  distribution  for 
EAFRP  input,  with  Pareto  distributed  transmitting  rates 
of  tail  index  1.3.  On  and  Off  periods  are  Pareto  distributed 
of  tail  indices  1.5  and  1.3,  respectively. 


Figure  2:  The  actual  single  user  network  traffic  (1st  row) 
,  the  synthesized  traffic  (2nd  row)  and  their  corresponding 
LLCD  plots  and  codifference  estimate. 


A.  APPENDIX  A 

Let  E(t)  =  A(t)V(t),  where  V(t)  is  an  AFRP  and  A(t)  rep¬ 
resents  the  random  transmission  rate.  The  density  function 
of  E(t)  equals: 

fE(e)  =  P  [V(t)  =  0]5(e)  +  P[V(t)  =  1  }fA(e)  (22) 

where  5(e)  is  the  Dirac  function,  taking  value  of  1  at  0. 
Thus,  /e(e)  is  a  scaled  version  of  /^(e),  which  is  a  power- 
law  function.  Hence,  E(t)  is  heavy-tail  random  variable 
with  tail  index  oa  for  fixed  t. 

To  show  generalized  long-range  dependence  we  proceed 
as  follows.  It  holds  that: 

E{e,lE{t+T)+,2E(t)}  _  +  7*)J  32  A(t)}  (23) 

For  notational  convenience,  let  Vi  =  V(t  +  r),  V2  =  V(f), 
Ai  =  A(t  +  t),  Ai  =  A(t) 

9v(si,s2)  =  E{e’lVl+'2V2)} 

=  1  +  si£{Vi}  +  S2E{V2}  + -(s?E{Vi} 

+s2E{V 2}  +  2SlS2E{ViV2})  +  ...  (24) 

Taking  into  account  that  E{V"V2m}  =  F{Vi V2}  and  also 
the  stationarity  of  V(t)  we  find  that: 

$1(31,82)  =  l  +  (esi  +e’2 -2)»j  +  (eai  -l)(e’2  -\)E{ViV2) 

(25) 


where  77  =  E{V(t)}. 

Now  plugging  (25)  and  (23)  in  the  generalized  codiffer¬ 
ence  expression  (see  eq.  (3))  will  yield  terms  like  $a(si),  $a(®2), 
and  also  the  term  E  {e“lAl  +*2^2  j  The  latter  term  depends 
on  whether  t  +  r  and  t  are  in  the  same  or  different  states. 

Let  us  denote  the  residue  life  of  the  state  at  time  t  by  T. 
Then, 

E{e.lA1+.2A2j 

=  E  {e*lj4l+'2j42|T  <  r}  P{T  <  r} 

+E  {eilj4l+'aA2 1 T  >  r}  P{T  >  r} 

=  $a(8i)$a(s2)P{T  <  r}  +  $a(«i  +  s2)P{T  >  r} 

From  basic  renewal  theory,  we  have 

P{T  >  t}  —  P{T  >  t\ t  £  On  state}P{t  £  On  state} 

+P{T  >  r\t  £  Off  state}P{f  £  Off  state} 

jfeo,lT1-o,1  kQ°T1~a° 

(/11 +Po)(ai  -  1)  (Mi +Po)(«o  -  1) 

Also,  from  proposition  1,  we  have 

E{Vi V2}  T  V2  +  CT1_ai  (26) 

Considering  the  approximation  log(  1  +  x)  ~  x,  |a;|  <  1 
and  for  r  -4  oo: 

j(«l,  »25t)  =  —  In  —  l)v  +  1)((&A  (*2)  “  1)71  +  *)  +  *) 

+  (*a(*1)  -  1)(®a(»2)  -  +0(T2<1-“i>)] 

+  ln|l  +^(»a(*i)  -  1)]  +  1»[1  +  »(*a(»2>  -  1)1 

_ (*A(»l)  -  1)(*a(»2)  ~  1>e _ T1_“i 

((®a(*1)  -  1>»  +  D«»a(»2)  -  1)1  +  1) 

+  0(T2<l-“i>) 

c2r  1  — a*  (27) 

where  C2  is  some  constant  and  ai  =  min{a i,ao}-  Setting 
Si  =  j,  S2  =  —j  it  is  easy  to  find  that  the  discriminant 
of  the  denominator  equals  — 4P{sin2(A)}  <  0.  Thus,  the 
denominator  is  always  positive,  and  as  a  result,  c2  is  always 
negative,  which  completes  the  proof. 
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ABSTRACT 

Recent  research  has  shown  that  traffic  in  Ethernet  and  other 
networks  tends  to  exhibit  properties  of  self-similarity  such 
as  long-range  dependence  and  a  high  degree  of  correlation 
between  arrivals.  This  paper  investigates  the  impact  of  the 
switching  network  on  the  self-similar  properties  of  the  traf¬ 
fic.  This  simulation  study  reveals  that  switching  networks 
tend  to  reduce  the  self-similarity  of  highly  self-similar  traf¬ 
fic.  This  is  because  of  the  truncation  of  long  bursts  due  to 
packet  discards,  and  also  because  of  aggregation  of  flows 
through  concatenated  rather  than  superposed  bursts.  On  the 
other  hand,  switching  systems  have  the  opposite  effect  of 
increasing  the  self-similarity  of  input  traffic  that  has  no  self¬ 
similar  properties  such  as  traffic  with  Poisson  or  uniformly 
random  distributions.  This  paper  also  presents  simulation- 
based  evidence  of  the  causes  behind  these  phenomena. 

1.  INTRODUCTION 

Recent  work  by  many  researchers  has  shown  that  traffic  in 
Ethernet  and  other  networks  tends  to  be  bursty  at  many  or 
all  time-scales  [2,6],  and  that  this  phenomenon  can  be  math¬ 
ematically  described  using  the  notion  of  self-similarity.  Ex¬ 
tensive  research  has  been  done  on  the  impact  of  the  self¬ 
similar  properties  of  traffic  on  network  design  issues,  such 
as  queueing  performance  [10],  switch  performance  [3],  con¬ 
gestion  control  [5]  and  scheduling  algorithms  [4],  While  it 
is  clear  that  traffic  characteristics  have  an  impact  on  net- . 
work  design  issues,  it  is  also  true  that  the  properties  of  a 
network  have  an  impact  on  the  characteristics  of  the  traffic 
as  it  progresses  through  the  network.  Very  few  studies,  how¬ 
ever,  have  addressed  this  issue  of  changes  in  traffic  charac¬ 
teristics  caused  by  the  network  [1,8, 12,  14],  These  stud¬ 
ies  have  focused  on  only  the  impact  of  individual  compo¬ 
nents  of  a  network  such  as  the  traffic  shaper  [12],  the  packet 
scheduler  [1, 14]  or  a  single-server  queue  [8],  as  opposed 
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to  the  impact  of  the  entire  network  as  a  whole.  In  addition, 
studies  such  as  [8]  have  obtained  insightful  theoretical  re¬ 
sults  which,  however,  cannot  be  readily  applied  to  realistic 
network  environments  to  solve  problems  in  network  engi¬ 
neering.  Further,  studies  such  as  in  [12, 14]  only  consider 
short-range  burstiness,  which  does  not  capture  all  of  the  fea¬ 
tures  of  self-similar  traffic,  especially  long-range  burstiness 
as  observed  in  [2,6], 

This  paper  presents  a  simulation  study  of  the  impact  of  a 
switching  network  on  the  self-similar  properties  of  the  traf¬ 
fic,  and  investigates  the  causes  underlying  the  observed  phe¬ 
nomena.  We  use  self-similar  traffic  generated  using  the  frac¬ 
tional  ARIMA  model  [7],  and  a  baseline  Banyan  topology 
for  the  switching  network.  Section  2  discusses  the  network 
and  the  traffic  model  in  greater  detail. 

Our  simulation  study  reveals  that  switching  networks 
tend  to  reduce  the  self-similarity  of  highly  self-similar  traf¬ 
fic.  This  is  because  of  the  truncation  of  long  bursts  due 
to  packet  discards,  and  also  because  of  the  aggregation  of 
flows  through  concatenated  rather  than  superposed  bursts. 
On  the  other  hand,  switching  systems  also  increase  the  self¬ 
similarity  of  input  traffic  that  has  no  self-similar  properties 
such  as  traffic  with  Poisson  or  uniformly  random  distribu¬ 
tions.  Section  3  presents  these  simulation  results  and  the  re¬ 
lated  analysis  with  simulation-based  evidence  of  the  causes 
behind  the  phenomena  that  yield  these  results.  This  section 
also  explains  our  results  in  relation  to  those  obtained  in  [8] 
and  [11],  Section  4  concludes  the  paper. 

2.  NETWORK  AND  TRAFFIC  MODEL 
2.1.  Network  Model 

This  study  uses  an  N x N  baseline  Banyan  multistage  net¬ 
work,  with  N  source  nodes  and  N  destination  nodes.  The 
switching  network  consists  of  logmiV  stages  of  roxm 
switching  elements.  In  Banyan  topologies,  the  path  between 
a  source  end-point  and  a  destination  end-point  is  unique. 
This  property  of  Banyan  networks  helps  our  study  of  the 
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impact  of  switching  systems,  since  it  eliminates  other  sec¬ 
ondary  effects  such  as  due  to  the  choice  of  a  routing  algo¬ 
rithm.  The  popularity  of  Banyan  topologies  in  real  imple¬ 
mentations  is  an  additional  motivation  behind  our  use  of  this 
network  model.  Figure  1  shows  a  baseline  Banyan  network 
topology  with  N  =  16  and  m  =  4. 

Each  source  node  consists  of  N  traffic  generators,  each 
of  which  generates  traffic  intended  for  a  distinct  destina¬ 
tion  node.  Thus,  the  system  consists  of  a  total  of  N2  traffic 
generators.  In  our  simulations,  traffic  generators  are  all  in¬ 
dependent,  and  generate  no  more  than  one  packet  per  cycle. 
We  assume  that  packet  lengths  are  constant,  and  that  exactly 
one  packet  can  be  transmitted  during  each  cycle  across  any 
port.  If  more  than  one  packet  are  created  in  a  source  node 
during  the  same  cycle,  only  one  of  these  is  allowed  to  be 
transmitted  while  all  the  others  are  buffered  in  a  queue.  We 
assume  that  the  queue  sizes  at  the  source  nodes  are  large 
enough  that  no  packet  is  ever  dropped  before  it  enters  the 
network.  Destination  nodes  drain  packets  from  the  output 
ports  of  the  last  stage  of  switching  elements,  at  the  maxi¬ 
mum  rate  of  one  packet  per  cycle. 

In  the  switching  elements,  each  input  port  is  associated 
with  an  input  buffer  of  a  fixed  small  capacity  of  4  packet 
lengths.  Each  output  port  contains  a  dedicated  output  buffer. 
In  addition,  our  simulations  also  use  a  shared  output  buffer 
of  capacity  equivalent  to  4  packets  per  output  port  for  addi¬ 
tional  space  for  the  output  queues.  Under  most  traffic  condi¬ 
tions,  the  shared  buffer  improves  performance  through  bet¬ 
ter  buffer  utilization.  During  each  cycle  in  our  simulations, 
switching  elements  can  accept  no  more  than  one  packet  at 
each  input  port  into  the  input  queue.  Each  non-empty  out¬ 
put  queue  transmits  exactly  one  packet  to  the  output  port  in 
each  cycle.  We  use  the  round-robin  scheduling  algorithm  to 
transfer  packets  to  and  from  the  shared  queue.  A  packet  ar¬ 
riving  at  an  input  port  first  enters  the  associated  input  buffer, 
then  the  shared  output  buffer,  and  finally  the  output  buffer 
corresponding  to  the  destination  port.  Our  model  ensures 
that  the  maximum  bandwidth  with  which  the  shared  buffer 


can  be  written  into  or  read  from,  is  equal  to  the  maximum 
aggregate  input  or  output  bandwidth  of  the  switch.  Pack¬ 
ets  arriving  at  a  full  input  buffer  are  dropped.  No  packets, 
however,  are  dropped  at  any  other  point  within  the  switch¬ 
ing  element,  i.e.,  packets  are  forwarded  to  the  shared  buffer, 
or  to  an  output  buffer  only  if  there  is  room  available. 


2.2.  Traffic  Model 

We  use  the  fractional  autoregressive  integrated  moving  av¬ 
erage  (FARIMA)  model  [7]  to  synthesize  self-similar  traffic. 
FARIMA(p,  d,  q )  is  defined  as 

$(B)Xn  =  ©(B)A-de„, 

where  B  is  the  backward  operator,  i.e.,  Bxn  =  xn-i.  The 
definition  above  can  be  also  expressed  as 


= 


-  0i  A" 


+cf>iXi-i  H - +  <t>pXi-p. 


-,i  —  1,2,... 


In  equation  (1),  A~d  is  defined  as  A  d  =  bi(-d)B\ 
where  bo(—d)  =  1  and 

b<(_d)  =  r  (d)r$'+T),i  =  1,2’--- 

When  the  innovation  e*  is  a  stable  process  with  index  a, 
i.e.,  f-i  ~  the  Hurst  parameter,  H,  and  the 

quantities  a  and  d  are  related  by  d  =  H  —  1/a.  In  this 
paper,  we  use  the  Hurst  parameter  as  the  measure  of  the  de¬ 
gree  of  self-similarity.  The  Hurst  parameter  has  a  range  of 
0.5  <  H  <  1,  and  a  larger  value  of  H  implies  a  higher 
degree  of  self-similarity.  Throughout  our  work,  we  use 
a  =  1.2,  a  =  1,  P  =  0,  n  =  0,p  =  50,  and  q  =  400.  0(B) 
is  generated  by  selecting  in  [0,  0.05]  randomly  and  in¬ 
dependently.  Unlike  0(B),  $(B)  is  generated  by  selecting 
p/2  complex  roots  and  their  conjugates,  since  X,  converges 
only  if  all  roots  of  4>(B)  are  in  the  unit  circle.  The  real  and 
imaginary  components  of  each  root  are  uniform  in  [0, 0.05]. 
Finally,  we  normalize  X,  to  a  series  of  1  ’s  or  0’s  indicating 
whether  or  not  a  packet  is  generated  during  a  given  cycle. 

The  variance-time  plot  [9]  is  used  to  estimate  the  Hurst 
parameter  of  observed  network  traffic.  For  a  self-similar 
time  series  X(k),  Xm(k)  is  defined  as 


Xm(k) 


(m+l)fc- 

1  E 


uuu 

varXm  =  varX/m/3, 

where  H  =  1  -  /3/2.  Taking  the  logarithm  of  the  equation 
above,  we  get, 

log  varXm  =  — /3  log  m  +  logvarX, 

From  the  above  equation,  the  Hurst  parameter  is  determined 
by  the  slope  of  the  plot  of  log  varXm  vs.  log  m. 
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Figure  2:  Per-hop  changes  in  self-similarity. 


3.  SIMULATION  RESULTS  AND  ANALYSIS 

In  a  switching  element  with  buffers,  a  flow  typically  con¬ 
sumes  more  space  during  a  bursty  period.  Under  such 
conditions,  depending  on  the  buffer  sharing  policy  and  the 
buffer  sizes,  either  other  flows  suffer  from  less  empty  space, 
or  the  bursty  flow  suffers  a  higher  packet  loss  rate.  In  either 
of  these  cases,  the  traffic  characteristics  change  due  to  de¬ 
lays  or  losses  or  both.  If  two  or  more  flows  are  bursty  at 
the  same  time,  these  effects  are  further  magnified.  This  sec¬ 
tion  presents  our  study  of  these  effects  on  the  self-similar 
properties  of  traffic. 

Our  study  includes  two  kinds  of  traffic  sources,  self¬ 
similar  and  uniform  random  traffic.  A  uniform  random  traf¬ 
fic  source  generates  a  packet  during  each  cycle  with  a  cer¬ 
tain  probability  p,  with  uniformly  distributed  packet  desti¬ 
nations.  Like  Poisson  traffic,  uniform  random  traffic  has  a 
Hurst  parameter  of  0.5,  indicating  that  it  has  no  self-similar 
properties. 

Our  simulation  study  shows  that  the  impact  of  a  switch¬ 
ing  system  on  the  self-similarity  of  the  traffic  depends  on 
the  self-similarity  of  the  input  traffic  itself.  A  switching  sys¬ 
tem  reduces  the  self-similarity  of  highly  self-similar  traffic, 
while  it  increases  that  of  non-self-similar  traffic  such  as  uni¬ 
form  random  traffic.  Figure  2  illustrates  this  phenomenon  of 
the  opposite  nature  of  the  effects  observed  depending  on  the 
self-similarity  of  the  input  traffic  itself.  When  the  traffic  is 
uniformly  random,  the  Hurst  parameter  increases  from  0.5 
to  0.64  after  the  first  stage  and  stays  around  0.65  thereafter. 
When  the  input  traffic  has  a  high  level  of  self-similarity, 
the  Hurst  parameter  drops  from  0.86  to  0.78  after  the  first 
stage,  and  further  to  0.76  after  the  second  stage.  This  inter¬ 
esting  phenomenon  shows  us  that,  switching  networks  have 
the  effect  of  shaping  the  traffic  characteristics  to  a  moder¬ 
ate  level  of  self-similarity.  In  the  following,  we  investigate 


Burst  Length 


Figure  3:  Distribution  of  short  burst  length,  (a)  input  traffic 
and  (b)  output  traffic. 


and  present  simulation-based  evidence  of  the  causes  of  this 
phenomenon. 

In  the  case  of  uniform  random  traffic,  the  probability 
of  packet  arrivals  during  each  cycle  is  independent  of  the 
packet  arrival  pattern  during  the  previous  cycles.  As  the 
traffic  progresses  through  a  switching  network  with  buffers 
in  the  switching  elements,  this  independence  assumption 
progressively  becomes  less  valid.  Packets  for  the  same  out¬ 
put  port  that  independently  arrive  at  different  times,  due  to 
congestion,  end  up  waiting  in  the  buffers  for  transmission, 
and  get  transmitted  in  a  burst  at  the  output  port.  This  phe¬ 
nomenon  adds  burstiness  to  the  traffic  at  each  new  hop  in  the 
path  of  the  traffic,  changing  the  output  traffic  characteristics 
to  something  other  than  random  uniform  traffic.  Packet  ar¬ 
rivals  at  subsequent  hops  of  the  network  are  now  correlated, 
as  reflected  in  the  increased  Hurst  parameter  of  the  traffic. 

The  distribution  of  burst  lengths  in  highly  self-similar 
traffic  is  heavy-tailed,  i.e.,  the  probability  distribution  is 
given  by  P[X  >  x]  ~  x~a.  Such  a  distribution  decays 
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Number 
of  Nodes 

H  (Input 
Traffic) 

H  (Output 
Traffic) 

Percentage 

Decrease 

2x2 

0.860 

0.864 

~  0 

4x4 

0.859 

0.843 

2% 

8x8 

0.863 

0.812 

6% 

16  x  16 

0.861 

0.763 

11% 

Table  1:  Self-similarity  of  traffic  vs.  number  of  nodes. 


more  slowly  than  exponentially,  causing  a  high  likelihood 
of  long  bursts  in  self-similar  traffic.  However,  because  of 
congestion  and  limited  buffering  capacities  in  the  switch¬ 
ing  elements,  long  bursts  do  not  easily  survive  in  switching 
networks.  In  fact,  long  bursts  self-destruct  through  causing 
congestion,  triggering  discarding  of  packets  and  thus  break¬ 
ing  the  long  burst  into  smaller  ones.  For  example,  in  our 
simulations,  the  longest  burst  observed  at  the  traffic  source 
had  more  than  6,000  consecutive  packets,  while  the  out¬ 
put  traffic  had  no  bursts  longer  than  900  packets.  This  is 
also  illustrated  in  Figure  3,  which  shows  that  the  distribu¬ 
tion  of  short  burst  lengths  of  the  input  and  the  output  traffic 
of  the  network.  The  output  traffic  has  a  larger  percentage 
of  shorter  bursts,  and  with  a  significantly  smaller  average 
burst  length.  Note  that  the  relative  increase  in  shorter  bursts 
increases  the  value  of  the  index  a  in  a  heavy-tailed  distribu¬ 
tion  given  by  P[X  >  x]  ~  x~a.  This,  in  turn,  has  the  ef¬ 
fect  of  reducing  the  Hurst  parameter  since,  as  shown  in  [13], 
H  =  (3  -  a)/ 2. 

In  addition  to  the  reduction  in  burst  lengths,  the  other 
reason  for  this  phenomenon  is  that  aggregation  of  flows  in 
networks  typically  has  the  effect  of  reducing  variation  over 
larger  scales.  It  should  be  understood  that  this  phenomenon 
is  quite  different  from  that  shown  in  [1 1],  Willinger  et  al. 
show  that  a  superposition  of  many  ON/OFF  traffic  sources 
exhibits  properties  of  self-similarity  when  the  lengths  of  the 
ON  and  OFF  periods  are  independent  and  follow  a  heavy¬ 
tailed  distribution.  Because  of  the  limited  bandwidth  of  out¬ 
put  links,  a  true  superposition  is  never  possible  in  switching 
networks.  Bursts  are  actually  concatenated  rather  than  su¬ 
perposed  on  top  of  each  other  on  the  output  links.  A  super¬ 
position  increases  variation  across  scales,  but  a  concatena¬ 
tion  actually  has  the  effect  of  spreading  out  the  peaks  and 
thus  smoothening  out  the  variations. 

The  phenomenon  discussed  above  can  be  verified  through 
simulation  using  a  single  mxm  switching  element  and 
varying  m.  In  this  model,  each  output  link  is  fed  by  m  self¬ 
similar  traffic  sources  at  the  inputs.  Table  1  shows  the  im¬ 
pact  on  the  self-similarity  of  the  output  traffic  for  different 
values  of  m.  As  can  be  observed  from  Table  1,  the  self¬ 
similarity  of  traffic  decreases  as  the  level  of  aggregation  in¬ 
creases.  This  reduction  in  the  self-similarity  of  traffic  with 
aggregation,  is  also  the  reason  that  highly  self-similar  traffic 


Figure  4:  Per-hop  changes  for  different  switch  sizes. 


reduces  in  self-similarity  as  it  progresses  through  each  hop 
in  the  switching  network.  This  is  easily  understood  from 
noting  that  the  output  links  further  hops  away  from  the  traf¬ 
fic  sources  cany  more  of  an  aggregated  traffic  than  the  ones 
closer  to  the  sources. 

The  same  phenomenon  is  apparent  in  the  impact  of  the 
size  of  switching  elements  used  in  the  topology  of  a  switch¬ 
ing  network.  A  network  designed  using  2x2  switching  el¬ 
ements,  as  compared  to  4x4  switching  elements,  will  con¬ 
tribute  to  a  smaller  decrease  in  the  observed  Hurst  parameter 
after  the  first  hop.  A  network  with  4x4  switching  elements 
achieves  the  same  level  of  aggregation  in  fewer  hops  than 
one  using  2x2  switching  elements.  Figure  4  shows  the  per- 
hop  changes  in  the  self-similarity  of  traffic  for  switching 
networks  with  different  sizes  of  switching  elements. 

It  is  worthwhile  to  discuss  our  results  in  relation  to  those 
obtained  by  Song  et  al.  [8]  in  their  study  of  self-similarity 
of  output  traffic  at  a  single  server  with  an  infinite  buffer.  It 
is  proved  in  [8]  that  if  the  queue  length  has  finite  variance, 
the  self-similar  properties  of  input  and  output  traffic  remain 
the  same.  In  fact,  it  is  shown  that  both  the  input  and  out¬ 
put  traffic  have  the  same  Hurst  parameter.  Noting  that  it  is 
unrealistic  to  assume  an  infinite  buffer,  the  authors  in  [8]  ar¬ 
gue  that  in  real  switches,  the  condition  that  queue  length  has 
finite  variance  is  always  satisfied.  However,  another  impor¬ 
tant  impact  of  finite  buffers  should  be  considered — traffic 
can  be  accepted  into  the  buffer  only  if  there  is  available 
space.  In  the  absence  of  a  feedback  mechanism,  packet  dis¬ 
carding  becomes  inevitable  which  changes  the  traffic  char¬ 
acteristics;  in  the  presence  of  a  feedback  mechanism  such 
as  credit-based  flow  control,  the  characteristics  of  arriving 
traffic  itself  changes.  A  second  important  reason  for  the 
apparent  discrepancy  between  our  results  and  that  in  [8]  is 
that  our  results  use  the  self-similar  properties  of  aggregate 
traffic  at  each  output  link,  while  the  results  in  [8]  compare 
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the  properties  of  individual  flows  before  and  after  service 
by  the  server.  Our  approach  to  only  analyze  aggregate  traf¬ 
fic  is  motivated  by  the  fact  that  most  switches  and  routers 
do  not  maintain  per-flow  queueing,  and  therefore,  only  the 
characteristics  of  the  aggregate  traffic  at  each  input  or  out¬ 
put  link  is  important  to  the  performance  and  related  issues 
in  the  design  of  switches  and  routers. 

All  of  the  results  presented  in  this  paper  were  obtained  at 
moderate  or  heavy  traffic  loads.  As  one  might  expect,  the 
impact  of  the  switching  network  on  the  traffic  characteristics 
is  minimal  when  the  traffic  load  is  small. 

4.  CONCLUSION 

In  this  paper,  we  have  presented  simulation  studies  that 
show  that  highly  self-similar  input  traffic  reduces  in  its  self¬ 
similarity  as  it  progresses  through  a  switching  network.  Our 
analysis  indicates  that  this  phenomenon  is  caused  by  the 
truncation  of  long  bursts  due  to  packet  discarding,  and  by 
the  aggregation  of  flows  through  concatenation  of  bursts. 
On  the  other  hand,  during  periods  of  congestion  in  networks 
with  buffers,  input  traffic  with  no  self-similar  properties  in¬ 
creases  in  self-similarity  as  it  progresses  through  the  net¬ 
work. 

These  results  have  important  implications  relevant  to  the 
design  of  routers  and  switches.  For  example,  our  results 
suggest  that  core  Internet  routers  receive  traffic  that  is  much 
less  self-similar  than  traffic  that  emerges  out  of  border 
routers  directly  connected  to  Ethernet  LANs. 
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ABSTRACT 

Traffic  measurements  in  many  network  environments  demon¬ 
strate  the  coexistence  of  both  long-  and  short-range  de¬ 
pendence  in  traffic  traces.  In  this  paper,  we  use  the  frac¬ 
tionally  integrated  autoregressive  moving  average  (FARI- 
MA)  processes  with  non-Gaussian  innovations  to  describe 
packet  arrival  rate  in  a  unit  time.  Specifically,  we  investi¬ 
gate  cepstrum-based  approaches  for  parameter  estimation 
in  FARIMA  processes.  We  examine  the  fractional  differenc¬ 
ing  parameter  estimation  procedure  based  on  the  smoothed 
periodogram  and  the  log  spectrum.  The  simulation  result- 
s  demonstrate  that  the  proposed  cepstrum  approach  gives 
better  estimation  accuracy  than  the  conventional  least-square 
spectrum  fit.  Usefulness  of  the  results  presented  is  demon¬ 
strated  on  the  real  network  traffic  traces  by  considering 
spectral  fitting  metrics. 

1.  INTRODUCTION 

The  objective  of  traffic  characterization  is  to  transform  com¬ 
plicated  intrinsic  processes  in  the  network  into  a  nearly  e- 
quivalent  traffic  model  which  is  credible,  analytically  tract¬ 
able  and  computationally  efficient.  Traffic  modeling  is  im¬ 
portant  in  many  areas  of  network  engineering  such  as  de¬ 
sign,  control  and  performance  evaluation.  Conventional 
models  for  the  network  traffic  include:  pure  Poisson  and 
Markov-Modulated  Poisson  processes;  packet-train;  fluid 
flow  and  autoregressive-moving  average  (ARMA)  models. 
However,  it  has  been  argued  that  these  models  do  not  rep¬ 
resent  completely  the  long-range  dependence  (LRD)  prop¬ 
erty  of  the  network  traffic  discovered  recently  by  researchers 
from  Bellcore  [1]. 

Long  memory  processes  are  able  to  capture  a  slowly 
decaying  auto-correlation  structure  of  the  underlying  time 
series  [2].  Suitable  approaches  for  the  generation  and  repre¬ 
sentation  of  long  memory  processes  include  fractional  Gaus¬ 
sian  noise,  fractionally  integrated  ARMA  processes  and  cha¬ 
otic  maps.  The  fractional  Gaussian  noise  model  has  been 
used  to  account  for  LRD  in  the  Ethernet  traffic  data  in  most 
of  the  publications.  However,  as  we  will  show  in  this  paper, 
network  traffic  is  usually  not  Gaussian  distributed  and  it- 
s  autocorrelation  cannot  be  represented  accurately  by  the 
single-parameter  (the  Hurst  parameter)  as  is  the  fraction¬ 
al  Gaussian  noise  model.  The  FARIMA  model  generalizes 


the  broad  class  of  ARMA  processes,  and  includes  the  frac¬ 
tional  differencing  (strictly  self-similar)  model  as  a  special 
case.  As  such,  FARIMA  modeling  encompasses  and  enrich 
currently  used  models  [3]. 

There  are  two  methods  one  can  adopt  for  model  build¬ 
ing:  the  first  relies  on  a  theoretical  model  formulation  from 
basic  events  in  the  network;  the  second  employs  experimen¬ 
tal  data  fitting.  In  this  paper,  we  take  the  second  approach. 
The  model  validation  is  achieved  by  employing  statistical 
tests  of  goodness-of-fit  for  the  dependence  in  time  series  [4], 
Since  the  traffic  data  is  usually  non-Gaussian,  we  propose 
to  use  the  polyspectra  approach  [5]  to  estimate  the  the  pa¬ 
rameters  of  an  FARIMA  model.  This  approach  does  not 
make  any  assumption  on  the  marginal  distribution  of  data 
except  that  the  second  and  third  order  moments  are  finite. 
In  addition,  it  does  not  require  a  priori  knowledge  of  the 
order  of  the  ARMA  part  and  can  identify  non-minimum 
phase  systems. 

The  data  sets  used  in  the  experimental  section  of  this 
paper  are  part  of  a  large  number  of  high-resolution  Eth¬ 
ernet  measurements  recorded  by  Bellcore,  Morristown.  We 
analyze  traces  obtained  from  the  URL  site  [6]  which  are  pre- 
processed  to  give  traffic  workload  (i.e. ,  a  number  of  bytes 
in  a  unit  time). 


2.  PRELIMINARIES  OF  FARIMA  PROCESSES 

An  FARIMA  process  It  with  parameters  (n,  d,  m)  is  defined 
through  the  following  difference  equation  [2]: 

$(«-1)-(l-2-1)dyt  =  e(z-1)et,  (1) 

where  t  indicates  discrete  time;  Yt  is  the  observed  time  se¬ 
quence;  et  is  an  i.i.d.  non-Gaussian  sequence  with  finite 
mean  and  variance;  z~1  is  the  back-shift  operator;  $(z-1), 
and  ©(z_1)  are  the  autoregressive  (AR)  and  moving  aver¬ 
age  (MA)  polynomials  of  order  m  and  n,  respectively.  The 
fractional  differencing  is  defined  by  the  binomial  series  ex¬ 
pansion  [2]: 

OO 

(1  -z~1)-d±J2ajz-i,  (2) 

3=0 
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where  the  coefficients  aj  are  given  through  the  recursive 
formula: 

cto  =  1,  aj  =  aj- i- — “•  (3) 

The  FARIMA  process  Yt,  defined  in  (1),  can  be  interpreted 
as  the  output  of  the  system  (Fig.  1)  driven  by  et  with  the 
transfer  function  H(z~l)  =  I/arma^"1),  where 

Harma{z  *)  =  hiZ~l  is  the  ARM  A  part  of 

the  system,  and  {hj}  is  its  impulse  response. 


l 

Yt 

non-Gaussian 

(l-z-l)'* 

«arma(^J  —  *(*-1) 

fractional  Uy 

fractional  differencing  ARMA  shaping  differenced 

ARIMA  process 


Figure  1:  Model  of  a  fractionally  differenced  ARIMA  (FARI¬ 
MA)  process. 


3.  IDENTIFICATION  OF  AN  FARIMA  MODEL 

In  this  section,  we  describe  a  two-step  parameter  estima¬ 
tion  procedure  for  FARIMA  processes  based  on  the  avail¬ 
able  observations  {Yt,  t  =  1,  •  •  •  ,  AT}.  First,  we  obtain 
an  estimate  d  of  the  parameter  d  based  on  the  log  of  the 
power  spectrum.  This  is  carried  out  independently  of  the 
ARMA  part  of  the  model.  Second,  we  estimate  the  impulse 
response  of  the  ARMA  part  of  the  system  using  the  poly- 
cepstral  approach  [5].  In  the  ARMA  estimation,  we  operate 
on  data  {X4}  obtained  by  passing  {Ft}  through  (1  —  z~l)d". 
The  two-step  estimation  scheme  described  in  this  section  is 
illustrated  in  Fig.  2. 


- T 


Figure  2:  Proposed  algorithm  for  FARIMA  processes  parame¬ 
ter  estimation. 


3.1.  Estimation  of  the  Fractional  Differencing  Pa¬ 
rameter 

To  estimate  the  fractional  differencing  parameter  d,  we  adopt¬ 
ed  from  [7]  a  technique  based  on  the  log  of  power  spectrum 
or,  equivalently,  power  cepstrum.  With  the  “spectrum”  of 
{Ft}  given  as: 

/y(w)  =  |1  -  exp-7"  |_2<i/ARMA(w),  (4) 

where  /arma(w)  is  the  spectrum  shaping  from  the  ARMA 
filtering,  the  log  of  the  power  spectrum  can  be  expressed  in 


terms  of  the  cepstral  coefficients  {c*}  as  follows  [7]: 

oo 

log  /y  (to)  =  ^2  Ck  cos(ftu;),  (5) 

k= 1 


where 


1  r 

Ck  =  -  I  log[/y(w)]cos(A:w)cfw.  (6) 

n  Jo 

Using  the  weight  function  W(w)  =  —0.5  log[2(l  -  cosw)], 
we  define  the  weighted  power  cepstrum  index  S  as: 

T  W(6)logfY(e)de.  (7) 

77  Jo 

With  this,  it  can  be  shown  that  [7]: 


where  a*  are  power  cepstrum  coefficients  of  the  ARMA  part 
of  the  spectrum. 

Because  a*,  decays  exponentially  as  k  increases  [5],  we 
will  assume  that  above  certain  threshold  value  M,  a*,  = 
0,  for  k  >  M.  Then,  by  estimating  the  weighted  power 
cepstrum  index  S  and  coefficients  Ck  up  to  k  <  M,  we 
obtain  d,  based  on  (8),  in  the  following  way: 


d  = 


1  .  “-‘i 

*1  _  v^M-l  1  ~  Y2  Z°k)' 

6  2-,k=  0  k?  k=  0 


(9) 


where  Ck  and  S  are  estimates  of  ck  and  5,  respectively.  To 
obtain  c*  and  S,  we  use  the  periodogram  In  (w)  evaluated 
based  on  N  data  points.  With  the  Simpson  rule  for  calcu¬ 
lating  integrals,  Ck  is  estimated  from  (6)  as: 


x  liV/2] 

Ck  =  -  l°g-fN(wp)cos(ftwp).  (10) 

p=0 


The  estimate  S  is  obtained  based  on  (7),  in  a  similar  way 
as  Ck  in  (10). 

An  alternative  approach  to  calculate  the  d  parameter 
using  the  least-squares  fit  of  the  FD  model  to  the  peri¬ 
odogram  has  been  presented  in  [8]. 


3.2.  Estimation  of  ARMA  parameters 

Instead  of  finding  the  AR  and  MA  polynomials,  $(z-1)  and 
0(a-1),  we  employ  the  estimation  procedure  for  the  impulse 
response  {hi}  of  the  ARMA  filter  Rarma(z_1)  based  on 
the  new  observable  data  {Xt,  t  =  1,  •  •  •  ,  N},  as  shown  in 
Fig.  2.  We  assume  that  the  transfer  function  J7arma(«_1) 
admits  the  factorization: 

■Rarma(z-1)  =  A  ■  /(z_1)  •  O(z),  (11) 

where  A  is  a  constant  gain;  I(z~ *)  and  O(z)  are  a  minimum 
phase  and  a  maximum  phase  polynomials.  The  impulse 
response  {/i»}  is  obtained  in  the  following  way:  first  we 
calculate  the  unknown  impulse  responses  {*fc }  and  {o*}  of 
the  minimum  phase  and  maximum  phase  characteristics  of 
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the  system,  and  then  we  obtain  {/lit}  as  a  convolution  of 
{ifc}  and  {ofc}  (h  =  i*o). 

The  estimation  procedure  employed  use  the  minimum  and 
maximum  phase  differential  cepstrum  coefficients  which,  for 
non-symmetric  data  Xu  can  be  calculated  from  two  slices 
of  the  bicepstrum  as  described  in  [5]. 

4.  MODEL  VALIDATION 

4.1.  Non-Gaussianity  due  to  skewness  and  kurtosis 

When  estimating  the  ARMA  part  of  the  FARIMA  model  in 
the  previous  section,  we  made  an  assumption  that  the  ob¬ 
served  process  {Yt}  was  (i)  non-Gaussian  and  (ii)  asymmet¬ 
ric.  One  can  single  out  two  distinct  deviations  of  histogram 
from  the  Gaussian  distribution: 

•  one  of  the  tails  of  the  distribution  is  lengthened,  and 
the  distribution  becomes  skewed  (asymmetric); 

•  the  maximum  of  the  histogram  (pdf)  lies  higher  or 
lower  than  that  of  the  normal  distribution. 

The  shape  statistics  that  characterize  these  deviations  axe 
standardized  third  and  fourth  order  moments  which  for  a 
Gaussian  population  are  6  =  0  and  a  =  3,  respectively.  De¬ 
partures  from  these  values  are  indication  of  skewness  and 
kurtosis,  respectively.  To  reject  the  hypothesis  due  to  skew¬ 
ness  at  the  95%  significance  level  that  the  data  of  length 
2000  are  Gaussian,  it  is  sufficient  to  show  that  6  >  0.09  or 
6  <  -0.09.  The  same  data  fail  the  test  for  Gaussianity  due 
to  kurtosis  if  a  >  3.18  or  a  <  2.83. 

4.2.  Goodness-of-flt  test  for  the  dependence  struc¬ 
ture 

In  this  paper,  we  analyze  a  model  according  to  the  modified 
version  of  the  portmanteau  lack  of  fit  test  [4],  First,  we 
compute  the  following  test  statistics: 

n_  ipN- 1  ( 2 

T  a  4*  m=q  ^12) 

( Zal"il\2 
V^-'*=o  /n(w o  J 

where  /e(-)  is  the  spectral  density  of  the  model  parameter¬ 
ized  by  0.  The  test  statistics,  as  defined  in  (12),  measures 
departures  of  the  modeled  spectral  density  from  the  peri- 
odogram  in  the  whole  range  of  frequencies  ( — 7r,  jt].  In  the 
time  domain,  T  is  the  sum  of  the  squares  of  all  estimable 
correlations  of  the  residual  process  obtained  by  fitting  the 
chosen  model,  and  as  such  is  especially  useful  for  long  mem¬ 
ory  processes  [2].  If  the  parametric  model  is  correct,  then 
the  residual  process  is  uncorrelated,  and  T  should  be  close 
to  0.  The  test  statistics  T  is  asymptotically  normal,  and 
it  can  be  shown  that  Pr(T  <  c)  ~  $(\Zn(ck  —  l)/\/2)i 
where  $(■)  is  the  cumulative  distribution  function  (cdf)  of 
the  standard  normal  RV.  In  this  paper,  we  use  as  a  dis¬ 
criminating  measure  between  different  models  the  P-value 
of  the  test  statistics  in  (12)  defined  as  Pr(T  <  T*),  where 
T*  is  the  outcome  of  the  test  statistics  for  a  given  data  set. 
A  large  P-value  indicates  that  we  have  a  correct  model, 
while  a  small  value  supports  the  hypothesis  that  the  model 
is  inaccurate. 


5.  SIMULATION  AND  EXPERIMENTAL 
RESULTS 

In  this  section,  we  first  examine  the  performance  of  the  esti¬ 
mation  procedure  proposed  using  simulated  data,  and  then 
we  demonstrate  the  effectiveness  of  the  FARIMA  model  for 
network  traffic. 

5.1.  Simulated  Data 

Table  1  presents  estimation  results  using  the  simulated  FARI¬ 
MA  data.  Three  different  types  of  the  ARMA  part  are  con¬ 
sidered:  (i)  AR  with  HARma(z)  =  1_0 ^-t  5  00  MA  with 
Harma(z)  =  l-0.5z-1;  and  (iii)  ARMA  with  LTarma  (z)  = 

,  We  give  the  average  and  standard  deviation  val- 
ues  (in  parentheses)  of  Monte-Carlo  simulation  results  based 
on  processing  50  independent  blocks  of  data;  each  of  them 
with  212  samples.  The  system  driving  noise  was  zero-mean, 
white,  non-Gaussian  (exponentially  distributed).  While  es¬ 
timating  d,  we  used  the  value  of  M  =  15  beyond  which  we 
assumed  that  the  power  cepstral  coefficients  from  the  AR¬ 
MA  part  are  not  significant.  It  can  be  observed  that  the 
variance  of  the  d  estimator  increases  as  d  goes  from  0.1  to 
0.4.  The  d  estimator  is  biased  [7],  but  in  general,  a  good  fit 
was  observed  to  the  model  when  the  parameters  n  and  m 
were  small  (n,  m  <  5). 

The  estimation  method  for  the  d  parameter  used  in  this  pa¬ 
per  gives  much  better  results  than  the  least-squares  method 
presented  in  [8]. 

In  Fig.  3,  we  present  the  P-values  for  FAR  (FARIMA(0,d,m)), 
FMA  (FARIMA(n,d,0))  and  AR  models  as  a  function  of  the 
AR  or  MA  approximation  orders.  The  results  are  for  da¬ 
ta  which  were  generated  by  passing  one-sided  exponential 
noise  through  the  filter  (1  —  z~1)~0A  -  (1  —  0.5z  ')•  Each 
point  in  this  figure  is  the  average  of  P-values  calculated  for 
20  blocks  of  data  of  length  212.  As  we  see,  the  P-value  can 
indicate  the  correct  (n  ~  1)  order  of  the  FMA  representa¬ 
tion  and  shows  that  no  better  fit  is  obtained  by  using  higher 
orders  of  an  MA  part. 

5.2.  Ethernet  Data 

In  this  section,  we  apply  the  FARIMA  model  to  the  traf¬ 
fic  traces  which  were  obtained  from  the  URL  site  [6]:  BC- 
pOct89  and  BC-Oct89Ext.  The  first  trace  represents  internal 
traffic  on  the  Bellcore  LAN,  while  the  second  trace  repre¬ 
sent  external  traffic  from  Bellcore  to  the  outside  Internet 
world. 

Workload  of  the  Internal  Traffic 

The  trace  BC-pOct89  contains  traffic  for  about  30  min  (106 
Ethernet  packets  in  1,759  sec).  This  data  sets  was  first 
pre-processed  into  time  series  to  give  the  number  of  bytes 
in  10  millisecond  intervals.  In  our  analysis,  we  considered 
only  20  blocks,  each  of  212  samples,  which  gives  a  rise  to 
13.2  minutes  of  traffic.  In  such  a  time  interval,  we  can  as¬ 
sume  that  the  internal  traffic  environment  is  stationary  [?]. 
For  each  block  of  data,  we  fitted  four  models:  (i)  FAR; 
(ii)  FMA;  (iii)  AR  of  order  10  through  least-squares;  and 
(iv)  fractional  differencing  (FD)  model.  For  the  particular 
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Table  1:  Statistical  behavior  of  the  proposed  parameter  estimation  method  for  FARIMA  processes. 


#ARMA  (z  *) 

d 

_ _ _1 

1  —  0. B 
01  = 
d 

Z  1 

-0.5 

-01 

1-0. 

4>1  = 
d 

5  z  1 
-0.5 

d 

1-0. 
1-0. 
hi  =  —  0.3, 

-hi 

2g~* 

5*-1 

h2  —  —0.06,  ft 
-ft  2 

3  =  -0.012 
—  ^3 

0.1 

.in 

(.6e  -  2) 

0.51 

(1.0e-2) 

.109 

(1.9e  -  2) 

0.497 
(1.4e  —  2) 

.12 

(1.9e  —  21 

.3 

(1.4e  -  3) 

.05 

(1.8e  -  3) 

.03 

(9.0e  -  3) 

0.2 

.217 

(7.1e  -  2) 

0.52 

(1.2c -2) 

.219 

(8.7e  -  2) 

0.508 
(4.1e  -  2) 

.195 

(8.2e  -  2) 

.348 

(4.8e  -  3) 

.087 

(1.9e  -  3) 

.009 

(4.2e  -  3) 

0.3 

.327 

(1.03e  -  1) 

0.511 
(6.2e  —  1) 

.319 

(1.21e  -  1) 

0.538 
(2.6e  -  1) 

.326 

(1.23e  —  1) 

.228 

(4.6e  -  3) 

.0567 

(3.15e  -  3) 

.03 

(4.2e  -  3) 

0.4 

.455 

(1.01c  -  1) 

.541 

(6.2e  —  2) 

.413 

(1.21e  -  1) 

.44 

(2.6e  -  2) 

.428 

(1.23e  -  1) 

.4 

(7.6e  -  3) 

.0967 

(5.15e  -  3) 

.014 

(2.2e  -  3) 

trace,  the  averaged  indexes  of  skewness  and  kurtosis  from 
20  blocks  of  data  of  length  2000  were  as  follows:  b  =  0.4, 
a  =  2.8.  This  indicates  that  these  data  are  non-Gaussian 
and  non-symmetric. 

To  assess  goodness-of-fit  of  each  model  into  the  depen¬ 
dence  structure  of  the  underlying  time  series,  we  first  exam¬ 
ine  visually  the  fit  of  the  models  (their  transfer  functions)  to 
the  power  spectral  density  (PSD)  of  the  trace.  The  PSD  is 
estimated  by  ensemble  averaging  of  periodograms  evaluated 
in  each  block.  The  fits  of  the  estimated  models  to  the  aver¬ 
aged  periodogram  are  shown  in  Fig.  4.  We  present  results 
using  the  log-log  and  dB  scale.  The  log-log  plot  emphasizes 
the  low  frequency  region,  while  dB  plot  gives  an  idea  about 
the  overall  fit.  It  is  evident  that  the  FMA  model  with  just  3 
and  4  coefficients  in  the  MA  part  offers  the  best  fit.  The  es¬ 
timated  fractional  differencing  parameter  is  0.3395.  To  cap¬ 
ture  the  short-range  dependence  of  the  trace  (or  to  obtain  a 
better  fit  in  the  high  frequency  region),  we  used  a  fourth  or¬ 
der  MA  representation  (h  =  {1,  -0.40,  -0.17, 0.04, 0.14}). 

The  P-values  of  the  estimated  models  are  shown  in  Fig.  5. 
This  test  confirms  our  intuitive  observations  based  on  the 
periodogram  analysis.  Because  the  P-value  measures  the 
overall  fit  of  a  model  to  the  periodogram,  the  performance 
of  the  FD  model  is  worse  than  that  of  the  AE  model  of  or¬ 
der  greater  than  5.  It  is  evident  that  FMA  and  FAR  models 
give  the  most  parsimonious  representations. 


Workload  of  the  Eternal  Traffic 

The  trace  BC-Oct89Ext  represents  around  34  hours  of  ex¬ 
ternal  traffic.  We  pre-processed  the  data  to  get  the  number 
of  bytes  in  1  second  intervals.  We  apply  the  same  analysis 
to  BC-Oct89Ext  as  for  the  internal  traffic.  The  averaged  in¬ 
dexes  of  skewness  and  kurtosis  is  2.3  and  4.2,  respectively. 
This  shows  that  the  external  traffic  is  also  non-Gaussian 
and  non-symmetric.  The  PSD  based  on  the  periodogram 
and  three  fitted  models  and  the  P-values  of  the  modified 
portmanteau  test  statistics  are  shown  in  in  Figs.  6  (a)  and 
(b).  Apparently,  the  external  traffic  is  fitted  well  by  the  FD 
model.  There  is  no  significant  improvement  by  using  the 
FARIMA  approach.  The  fractional  differencing  parameter 
in  this  case  is  d  —  0.3981,  which  indicates  heavy  burstiness 
of  the  trace. 


P-value  of  test  statistics 


Figure  3:  P-value  of  the  modified  portmanteau  test  statistics 
for  three  types  of  fitted  models  to  the  simulated  data  with 
H{z)=  [(1-z-1)-0  4]  (1  —  0.5z-1):  (i)  FMA;  (ii)  FAR;  and 
(iii)AR. 

6.  CONCLUSION 

In  this  paper,  we  extended  the  long  memory  modeling  to  in¬ 
clude  the  short-term  dependence  in  the  data  by  using  FARI¬ 
MA  processes.  Because  of  non-Gaussianity  of  network  traf¬ 
fic,  we  developed  a  two  stage  parameter  estimation  scheme 
for  the  FARIMA  model  using  the  polyspectra  approach. 
We  evaluated  the  effectiveness  of  the  proposed  model  us¬ 
ing  real  network  traffic  data.  The  following  observations 
are  made:  (i)  the  model  proposed  provides  a  better  fit  to 
the  internal  LAN  traffic  than  the  conventional  least-squares 
AR  and  fractional  differencing  models;  and  (ii)  the  external 
LAN  traffic  is  well  characterized  by  fractionally  differenced 
model.  In  conclusion,  the  proposed  method  can  capture 
the  complex  dependence  structure  in  network  traffic  with  a 
small  number  of  parameters. 
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Figure  4:  Power  Spectral  Density  for  the  BC-pOct89  trace  of 
Ethernet  traffic  based  on  periodogram  and  four  fitted  models: 
(i)  FMA;  (ii)  FAR;  (iii)  FD;  and  (iv)  the  least-square  AR(10). 
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b)  P-values 


Figure  6:  External  traffic  trace  BC-Oct89Ext:  (a)  Power 
Spectral  Density  based  on  periodogram  and  four  fitted  mod¬ 
els:  (i)  FMA;  (ii)  FAR;  (iii)  FD;  and  (iv)  the  least-squares 
AR(10);  (b)  P-values  of  the  fitted  models. 
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Figure  5:  P-values  of  the  modified  portmanteau  test  statistics 
for  four  types  of  fitted  models  for  the  trace  BC-pOct89. 
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ABSTRACT 

The  application  of  optimal  nonlinear/non-Gaussian 
filtering  to  the  problem  of  inertial  navigation  system 
(INS)  alignment  is  described.  This  approach  is  made 
possible  by  a  new  technique  called  particle  filtering 
(PF).  PF  theory  is  introduced  and  nonlinear  error 
equations  of  INS  alignment  on  a  stationary  base  in 
the  case  of  large  initial  error  angles  are  used.  The 
algorithm  for  solving  the  problem  of  optimal  estimation 
of  the  state  vector  described  by  nonlinear  equations 
from  linear  measurements  has  been  developed.  The 
simulation  results  exhibit  the  superior  performance  of 
this  approach  when  compared  with  classical  sub- 
optimal  techniques  such  as  extended  Kalman  filtering 
(EKF). 

1  .INTRODUCTION 

Kalman  filtering  is  a  popular  tool  in  handling 
estimation  problems,  but  its  optimality  heavily  depends 
on  linearity.  When  used  for  nonlinear  systems,  its 
performance  relies  on,  and  is  limited  by  the 
linearizations  performed  on  the  concerned  model.  For 
those  essential  nonlinear  systems,  the  linearizations  may 
lead  to  divergence  of  filtering  process.  On  the  other 
hand,  despite  early  papers  on  nonlinear  filtering  theory, 
the  implementation  of  nonlinear  filters  has  been  plagued 
so  far  by  the  difficulties  inherent  to  their  infinite¬ 
dimensional  nature.  A  new  approach  to  optimal 
nonlinear  filtering  called  particle  filtering  (PF)  has  been 


presented  recently,  which  is  applied  to  the  Non- 
Gaussian/Nonlinear  filtering  problem  [1][2][3][4],  The 
main  feature  of  PF  is  that  it  constructs  the  conditional 
probability  of  the  variable  to  be  estimated,  with  respect 
to  the  measurements,  through  a  suitalbe  random  particle 
exploration  of  the  state  space  followed  by  a  Bayes 
correction  of  the  weights  of  the  particles. 

2.THE  THEORY  AND  PRIORI 
ALGORITHM  OF  PARTICLE 
FILTERING 

Let  the  dynamic  process  X  and  the  observation  process 
Y  be  governed  by 

X*+i  =  fQ^-k  >  ) 

Y*  =  h(Xk,k)+rik 

where  {ft>  k}  and  { ijk  },  k  ^  0,  are  sequences  of 
independent  random  variables  with  appropriate 
dimensions.  Rn  is  defined  as  the  strength  matrix  of  r\k  , 
which  is  assumed  to  be  strictly  positive  definite./ and  h 
are  measurable  functions  of  X  .  PF  concerns  the 
recursive  estimation  of  any  function  <P(Xk)  of  an  Rn- 
valued  stochastic  process  X  from  the  observation  of  a 
related  Revalued,  random  process  Y  , where  the  “best” 
(minimum  variance)  estimator  <P(Xk)*  is  given  by  the 
conditional  expectation 

E [0(X* )  I  Yk  =  yK  ]  =  <p(xk  )dP(xk  |  yK ) 
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(with  notation  K  stands  for  the  sequence  1,2, ...  k) 

The  priori  PF  algorithm  may  be  summarized  as 
follows[5] : 

a)  Initialization. 

Positions  of  N  particles  are  initialized  according  to 
dP(  X0)  and  the  weights  to  1/N  . 

b)  Evolution. 

Move  particles  according  to  X *+,=/ (Xk,  k  ,  cok )and 
randomly  generated  noises  (Ok . 

c)  Weighting. 

Weights  are  given  by 


j= i 


and  regularization  according  to  (  in  the  case  of  a 
Gaussian  observation  process  ) 


exp 

Zt(X,Y)= - e - 

2  i= i 

here  y  e  (0,1) 

(with  notation: 

d)  Estimation. 

According  to 

/= i 

the  estimation  of  &(X  k)  is  made. 

e)  Recursion. 

Step  from  b  to  d. 

Figure  1  depicts  the  procedure  of  the  priori  algorithm. 


Figure  1.  Block  diagram  of  the  priori  PF  algorithm. 
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3.  THE  APPLICATION  OF  PF  IN  INS 
ALIGNMENT 

The  problem  of  INS  alignment  has  been  considered  in  a 
great  number  of  publications[5][6].  In  the  case  of  small 
initial  error  angles,  the  problem  is  often  solved  on  the 
basis  of  a  linearized  description  of  INS  error  equations 
and  an  elaborate  procedure  of  the  optimal  linear  filtering 
proceeding  from  the  condition  to  achieve  the  maximum 
accuracy  over  the  minimum  time.  At  the  same  time,  in  a 
number  of  cases  INS  is  often  to  be  aligned  under  the 
conditions  that  the  initial  error  angles  are  comparily 
large  which  makes  it  necessary  to  take  account  of 
nonlinear  character  of  the  problem.  However,  no 
detailed  study  of  the  alignment  problem  with  due 
account  of  its  nonlinear  character  has  been  made  up  to 
now.  This  paper  gives  a  treatment  of  INS  alignment 
problem  by  PF.  The  Global  Positioning  Systems  (GPS) 
are  to  be  used  as  an  external  measuring  instrument  for 
the  information  about  the  carrier  position.  For  a 
stationary  carrier  the  information  of  its  zero  velocity  and 
acceleration  is  applied.  On  the  basis  of  Dmitriyev’s 
work[6],  nonlinear  error  equations  of  INS  alignment  on 
a  stationary  base  in  the  case  of  large  initial  error  angles 
are  as  follows: 


=  -g(<Py  cos  t  +  (px  sin  t ) + 2  coie  sin  Ldvy  +  Vx 
&y  =  git  cos t  -  (j)y  sin (j)2 )  -  2 coje  sinZ<5vx  +  V., 

4  =  -sin0.O4  cosZ + sin  L-dvy/R-ex 
<t>y  =(1- cost  Me cosZ  ~  t^ie sinZ +&>x/R-£y 
4  =  it  cos t  ~  t  smt  Me  cosZ  +  dvx  (tanZ)  /  R  -  ez 
and  the  observation  equation  may  be  written  as 

T,  =&,+!?* 

T2  =8vy+Vy 

The  nonlinear  equations  describe  the  behavior  of  the 
INS  alignment  errors  exactly.  With  the  scope  of  PF 
theory,  the  INS  alignment  is  formulated  as  the  problem 
of  optimum  estimation  of  error  angles  described  by 
means  of  nonlinear  equations  from  linear  measurements. 
The  validation  of  this  method  was  checked  by 
simulation  as  follows.  The  priori  algorithm  with  N=1500 
particles  was  used  in  the  simulation.  Figure  2  Shows  the 
filters  outputs,  i.e.,  the  RMS  deviation  of  the  error 
angles  Ox  and  <I>Z ,  as  estimated  by  the  PF(solid  line)  and 
EKF(dashed  line). 


t(0.01s)  t(0.01s) 

Figure  2.  The  RMS  deviation  of  estimation. 
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4.  CONCLUSION 

These  results  show  the  clear  superiority  of  particle 
nonlinear  filtering  over  those  classical  filtering  in  the 
problem  of  INS  alignment  in  the 
case  of  big  initial  error  angles,  although  the  former  is 
more  time/memory  consuming  as  the  number  of 
particles  grows.  These  problems  are  overcome  by  the 
advent  of  new  technologies,  making  parallel  processing 
available  to  embeded  systems,  and  enabling  PF  to  be 
implemented  in  on-board  real-time  systems. 

From  a  long  run,  PF  is  sure  to  be  a  powerful  tool 
dealing  with  nonlinear/non-Gaussian  filtering  problems 
such  as  INS  alignment. 
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ABSTRACT 

When  wideband  and  narrowband  interferences  in  a  GPS 
system  are  stationary,  a  large  number  of  data  samples  may 
be  obtained  to  get  a  good  estimate  of  the  interference.  How¬ 
ever,  the  jamming  environment  may  be  one  in  which  the 
narrowband  jammers  have  the  ability  to  change  frequen¬ 
cies  dynamically  or  the  rapid  dynamics  of  the  aircraft  dur¬ 
ing  maneuvering  causes  arrival  angles  of  wideband  jammers 
to  change.  In  either  type  of  jamming  environment,  an  in¬ 
terference  suppression  algorithm  will  only  be  effective  if  it 
can  rapidly  converge  with  a  small  sample  size.  We  investi¬ 
gate  the  performance  of  reduced-rank  interference  suppres¬ 
sion  algorithms  under  conditions  of  low  sample  support.  It 
is  demonstrated  that  the  multistage  nested  Wiener  filter 
(MSNWF)  outperforms  other  reduced-rank  techniques  in 
terms  of  suppressing  both  wideband  and  narrowband  jam¬ 
mers  under  conditions  of  low  sample  support. 

1.  INTRODUCTION 

Worldwide  military  use  of  GPS  is  evolving  due  to  the  wide 
availability  of  commercial  GPS  receivers,  and  the  widespread 
knowledge  of  the  force  enhancement  capabilities  offered  by 
GPS.  The  jamming  threat  is  serious  because  of  the  physi¬ 
cal  design  of  the  GPS  system.  The  received  power  from  the 
GPS  satellites  is  approximately  -157  dBW.  Many  jammers 
available  on  the  arms  market  today  either  already  cover  the 
GPS  frequencies,  or  can  be  modified  to  do  so.  Therefore,  a 
space-time  preprocessing  filter  prior  to  the  GPS  correlators 
is  one  of  several  proposed  methods  for  suppressing  such  jam¬ 
mers.  However,  space-time  preprocessors  can  exhibit  slow 
convergence  and  have  high  computational  complexity. 

This  paper  investigates  a  reduced  dimension  space-time 
preprocessor  based  on  the  multistage  nested  Wiener  filter 
(MSNWF)[4]  capable  of  operating  with  a  low  sample  sup¬ 
port  compared  to  other  reduced  dimension  methods  such 
as  cross-spectral  method[3]  and  principal  components.  The 
simulations  presented  herein  reveal  the  rapid  convergence 
of  the  MSNWF  implementation  of  the  power  minimization 
based  space-time  preprocessor,  thereby  showing  its  efficacy 
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in  adapting  to  the  environmental  dynamics  characterizing 
high  performance  fighter  aircraft. 

2.  POWER  MINIMIZATION  BASED  JOINT 
SPACE-TIME  PREPROCESSOR 

The  criterion  for  determining  the  optimal  set  of  space-time 
weights  is  premised  on  the  fact  that  the  respective  power 
levels  of  the  desired  GPS  signals  are  significantly  below  the 
noise  floor  and  that  the  jammers  that  could  have  deleterious 
effects  are  above  the  noise  floor.  The  goal  then  is  to  drive 
the  power  of  the  preprocessor  output  down  to  the  noise 
floor.  This  approach  serves  to  place  point  nulls  at  the  re¬ 
spective  angle-frequency  coordinates  of  strong  narrowband 
interferers  and  spatial  nulls  in  the  respective  directions  of 
broadband  interferers. 

In  order  for  the  GPS  receiver  to  provide  accurate  navi¬ 
gation  information,  it  is  necessary  to  track  the  signals  from 
at  least  four  different  GPS  satellites.  Given  the  parallax  er¬ 
ror  associated  with  GPS  satellites  at  near-horizon  relative 
to  the  aircraft,  it  is  generally  desirable  to  track  the  respec¬ 
tive  signals  from  a  larger  number  of  GPS  satellites,  e.g., 
twelve.  It  is  desired  then  that  the  preprocessor  “pass”  un¬ 
altered  as  many  GPS  signals  as  possible.  Thus,  the  magni¬ 
tude  of  the  multidimensional  Fourier  transform  of  the  space- 
time  weights  should  be  as  flat  (smooth)  as  possible  in  the 
spectral  domain  as  a  function  of  frequency  and  angular  di¬ 
mensions.  The  goal  is  to  achieve  a  desired  smoothness  while 
simultaneously  nulling  both  wideband  and  narrowband  in¬ 
terferers  under  conditions  of  low  sample  support. 

2.1.  Formulation  of  Objective  Function 

It  is  necessary  to  first  define  xm(n)  as  an  N  x  1  vector 
containing  N  successive  samples  of  the  output  of  the  m-th 
antenna  sampled  at  a  rate  above  or  equal  to  the  Nyquist 
rate  for  the  P(Y)  code. 

xm(n)  L?m(n),;rm(n  l),...,a;m(n  —  .v  -f-  1)]  (1) 

The  NM  x  1  space-time  snapshot,  x(n),  is  formed  from 
concatenating  xm(n),  m  =  1,2,  ...,M,  as 

x(n)  =  [xi(n);x2(n);  ...;xM(n)]  (2) 
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Figure  1.  Nested  chain  of  scalar  Wiener  filters  for  NM-1  joint  space-time  preprocessor. 


where  ;  implies  concatenating  the  vectors  into  a  single  col¬ 
umn.  Similarly,  the  N  tap  weights  for  the  m-th  antenna 
are  placed  as  the  components  of  an  N  x  1  vector  as 

hm  =  [hm(0),hm(l),...,hm(N-l)]T.  rn  =  l,2,...,M  (3) 

and  the  entire  set  of  space-time  weights  is  formed  from  a 
concatenation  of  hm,  m  =  1, ...,  M,  as 

h  =  [hi;h2;...;hM]-  (4) 

The  output  power  of  the  space-time  preprocessor  is 

£{|hHx(n)|2}  =  hHKh,  where:  K  =  E{x(n)xH (n)}.  (5) 

Assume  that  the  first  antenna  of  the  linear  array  is  the 
reference  antenna.  To  incorporate  the  unity  weight  con¬ 
straint  on  the  first  tap  of  the  reference  antenna,  define  x(n) 
as  the  (NM  -  1)  x  1  sub-vector  of  x(n)  containing  all  but 
the  first  element  of  x(n).  Similarly,  h*  is  defined  as  the 
(NM  —  1)  x  1  sub- vector  of  h  containing  all  but  the  first 
element  of  h. 

x(n)  =  [n(n);x(n)]  (6) 

h  =  [ljhjr]  (7) 

With  these  definitions,  the  power  at  the  preprocessor  out¬ 
put  may  be  expressed  as 

£{|hHx(n)|2}  =  £{M»)  +  h"x(n)|2}.  (8) 

Expressing  the  preprocessor  output  power  in  this  fashion  fa¬ 
cilitates  an  adaptive  filtering  formulation  where  the  output 
of  the  first  tap  of  the  reference  antenna  serves  as  the  “de¬ 
sired”  signal  and  the  “error”  signal  is  xi(n)  +  h^x(n).  As  a 
result,  LMS  and/or  RLS  based  adaptations  are  possible,  as 
developed  previously  for  the  case  of  space-only  processing 
[2]- 

3.  DIMENSIONALITY  REDUCTION  VIA 
REDUCED-RANK  METHODS 

The  disadvantage  of  space-time  processing  relative  to  space- 
only  processing  is  the  large  dimensionality  of  the  space- 
time  correlation  matrix  relative  to  the  spatial  correlation 


matrix.  This  translates  into  increased  computational  com¬ 
plexity  and  slower  convergence.  However,  depending  on 
the  frequency  and  spatial  distribution  of  the  interferes,  it 
may  be  possible  to  reduce  the  dimensionality.  Reduction  in 
dimensionality  implies  constraining  the  space-time  weight 
vector  to  lie  in  a  lower  dimensional  subspace.  Defining  an 
NM  x  NM  space-time  correlation  matrix  K  (formed  from 
M  antennas  with  N  taps  per  antenna),  the  original  power 
minimization  problem  from  [5]  is 

Minimize  h*Kh  (9) 

h 

subject  to:  IihSnm  =  1 

where  6nm  is  the  NMx  1  vector  Snm  —  [0, 1,  0, .., 0]T 

where  the  1  is  located  in  the  NM  position  of  the  vector. 
We  now  seek  to  force  the  space-time  weight  vector  to  be  in 
a  particular  reduced  dimension  subspace.  That  is  let  h  = 
Thr  where  T  is  the  dimensionality  reducing  transformation 
matrix.  Substitution  of  h  =  Thr  into  (9)  allows  one  to 
rewrite  the  power  minimization  problem  as 

Minimize  h^THKThr  (10) 

nr 

subject  to:  h^T^S^M  =  1 

Using  the  method  of  Lagrange  multipliers,  the  solution  to 
(10)  may  be  found  by  solving 

THKThr  =  aTHSNM  (11) 

where  a  is  the  Lagrange  multiplier  used  to  satisfy  the  unity 
weight  constraint  h^T^dj vm  =  1-  It  is  easily  shown  that 

Minimum _ 1 _ 

output  power-  d"MT(THKT)_1  THd nm' 

Since  THKT  is  Hermitian-symmetric,  it  follows  that  (THKT) 
is  Hermitian-symmetric,  so  that  a  is  real  valued. 

The  reduced  dimension  transformation  matrix  T  can 
be  found  by  techniques  such  as  the  cross-spectral  metric 
(CS)[3]  or  principal-components  (PC).  A  brief  overview  of 
these  methods  is  necessary  to  motivate  the  use  of  the  MSNWF. 
The  space-time  matrix  K  can  be  spectrally  decomposed 
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as  K  =  7Vt  \,elef ,  where  A;  are  the  eigenvalues  of 
K  indexed  in  descending  order  and  e,  are  the  correspond¬ 
ing  eigenvectors.  One  can  then  seek  dimensionality  reduc¬ 
tion  through  the  transformation  y(n)  =  THx(n),  where 

T  =  tei(i):e*(2): ' '  ‘  :ei(D)]  an  NM  x  D  matrix  contain¬ 
ing  D  <  NM  eigenvectors  of  K  and  {*(1),  i(2), 
is  a  subset  of  the  integers  {1,  2, NM}.  Given  that  the 
columns  of  T  are  eigenvectors  of  K,  it  follows  that  the 
D  <  NM  eigenvectors  of  K  comprising  T  can  be  selected 
as  those  which  maximize  the  cross-spectral  metric  [3], [4] 
defined  as 

^WAfT(THKT)-1TH<5jVAr  =  V  |e,y^M|2  (13) 

j=  i  *'(J) 

The  principal-components  technique  would  instead  se¬ 
lect  the  D  largest  eigenvectors  of  K  to  form  T.  Both 
techniques  are  quite  computationally  intensive  since  it  is 
necessary  to  generate  the  eigenvectors  of  K  before  find¬ 
ing  the  reduced-dimensioned  matrix  T  as  well  as  compute 
(ThKT)-1.  It  was  recently  shown  by  [6]  that  the  MSNWF 
generates  a  T  that  may  be  expressed  as 

T  =  [6nm,  KSnm,  ...,Kd-1£jva/]  (14) 

where  again  T  is  an  NM  x  D  matrix  containing  D  <  NM 
vectors  associated  with  the  D  stages  of  the  MSNWF.  This 
formulation  leads  to  a  simple  computation  of  T  as  a  func¬ 
tion  of  the  D- th  stage  chosen  to  truncate  the  MSNWF. 

Once  generating  the  particular  T  associated  with  each 
reduced-rank  method,  it  is  possible  to  explore  the  effects 
of  sample  support  associated  with  each  T.  It  was  shown 
in  [5]  that  the  MSNWF  outperformed  both  cross-spectral 
and  principal-components  in  terms  of  jammer  suppression 
as  a  function  of  rank.  It  is  now  of  interest  to  examine 
the  jammer  suppression  performance  of  each  rank-reducing 
method  as  a  function  of  sample  support.  This  is  illustrated 
in  the  simulations  of  Section  4.  First,  a  brief  development 
of  the  MSNWF  algorithm  is  provided. 

3.1.  MSNWF  Algorithm  Development 

Adaptive  filtering  schemes  center  upon  a  linear  Minimum 
Mean  Square  Error  (MMSE)  estimation  problem.  In  any 
linear  MMSE  problem,  the  optimum  weight  vector  h  is  the 
solution  to  the  Wiener-Hopf  equation 


R-xarh  —  rdx  (15) 

where  RIX  is  the  correlation  matrix  of  the  data  and  r^  is 
the  cross-correlation  vector  between  the  data  and  the  “de¬ 
sired”  signal.  The  MSNWF  represents  a  pioneering  break¬ 
through  in  that  it  simultaneously  achieves  a  convergence 
speed-up  substantially  better  than  that  achieved  with  PC 
and  a  dramatically  reduced  computational  burden  relative 
to  PC  as  well.  Intuitively  speaking,  achieving  the  best  of 
both  worlds  -  faster  convergence  AND  reduced  computa¬ 
tion  -  is  made  possible  by  making  use  of  the  information 
inherently  contained  in  both  R„  and  vdx  in  choosing  the 
reduced-dimension  subspace  that  h  is  constrained  to  lie 
within.  In  contrast,  PC  only  makes  use  of  the  information 
embedded  in  RXI. 


In  our  application  here,  Rxx  =  K  and  rdx  =  E{xt(n)x(n)}, 
assuming  the  unity  weight  constraint  is  at  the  first  tap  of 
the  first  antenna,  for  example.  The  MSNWF  algorithm  is 
summarized  below[4].  As  per  the  discussion  at  the  end  of 
Section  2,  the  “desired”  signal  d0(n)  is  the  output  of  the 
n-th  tap  at  the  ra-th  antenna. 

•  Initialization:  clo(n)  and  xo(rc)  =  x(n) 

•  Forward  Recursion:  For  fc  =  1,2, ...,  D: 


PA  =  £K-i(n)xfc_1(n)}/||EK_1(„)xfc_1(n)}|| 

dk(n)  =  pfx*_i(n) 

B  =  I -P*P? 

Xa  (n)  =  BxA-i(ra) 

(16) 


•  Backward  Recursion:  For  k  =  D,  D  —  1, ...,  1,  with 
££>(«)  =  dn(n): 


wk 

tA-i(rc) 


£{<_,(n)eA(n)}/£'{|eA(n)|2} 

dn-i(u)  -  wkek(n) 


It  follows  that  the  matrix  T  =  [pi,  P2, ...,  Pd]  contains 
orthonormal  columns  and  that  the  reduced  dimension  DxD 
correlation  matrix  THKT  is  tri-diagonal  [4].  The  MSNW.F 
is  depicted  in  Figure  1  which  clearly  displays  the  multiple 
stages  and  nested  structure.  Operating  in  a  D-dimensional 
space  is  tantamount  to  “cutting  off”  all  stages  below  the  D- 
th  stage.  The  updating  of  the  scalar  weights  wk  in  Figure 
1  may  be  effected  through  a  simple  LMS  algorithm. 

4.  SIMULATIONS 

Two  scenarios  are  presented  to  illustrate  the  performance 
of  the  reduced-rank  MSNWF  in  terms  of  nulling  both  wide¬ 
band  and  narrowband  jammers  while  operating  in  a  reduced- 
rank  mode  at  low  sample  support.  Consider  M  =  N  =  7. 
These  definitions  imply  an  M  =  7  element  equi-spaced  lin¬ 
ear  array  with  N  =  7  taps  at  each  antenna.  Although  typ¬ 
ical  antenna  arrays  for  GPS  are  two-dimensional  (planar), 
circular,  for  example,  or  conformal,  a  linear  array  was  used 
in  this  illustrative  simulation  example  in  order  to  have  only 
one  angular  variable  for  display  purposes.  This  allows  the 
use  of  a  single  mesh  or  contour  plot  to  display  the  two- 
dimensional  Fourier  Transform  of  the  space-time  weights 
obtained  from  a  given  run.  In  addition,  N  =  7  is  a  very 
small  number  of  taps  to  employ  at  each  antenna.  A  signifi¬ 
cantly  larger  number  is  needed  in  practice  in  order  to  form 
sharp  ‘point-nulls’  at  the  angle-frequency  coordinates  of  a 
narrowband  interferer  and  thereby  minimize  distortion  to 
the  GPS  signal.  The  simulated  preprocessor  is  constrained 
so  that  aq(n)  (l9t  tap  behind  1st  antenna)  is  our  refer¬ 
ence  signal,  i.e.  hi(0)  =  1.  The  other  taps  behind  each 
antenna  element  form  the  column  data  vector  x(n)  enter¬ 
ing  the  stages  of  the  MSNWF  as  illustrated  in  Figure  1. 
Table  1  summarizes  the  values  used  in  the  first,  scenario. 
Five  of  the  six  jammers  for  this  simulation  are  narrowband 
jammers  with  different  angles  of  arrival  (AOAs).  In  both 
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scenarios,  the  narrowband  jammers  have  different  frequency 
offsets  relative  to  the  LI  frequency.  Since  we  are  assuming  a 
20 M Hz  receiver  bandwidth  at  each  antenna,  the  noise  floor 
was  determined  to  be  approximately  -128  dBW  after  filter¬ 
ing  at  each  antenna.  Recall  the  goal  of  power  minimization 
is  to  drive  the  output  power  of  the  space-time  beamformer 
as  close  to  the  noise  floor  as  possible. 

4.1.  Reduced  Dimension  Performance 

Figures  2  and  3  plot  the  average  power  output  of  the  space- 
time  power  minimization  preprocessor  based  on  the  MSNWF 
as  a  function  of  subspace  dimension  or  rank  of  the  dimen¬ 
sionality  reducing  matrix  transformation.  The  subspace  di¬ 
mension  at  which  MSNWF  approximately  achieves  the  per¬ 
formance  of  the  full-dimension  ideal  (asymptotic)  Wiener 
filter  is  roughly  the  same  in  both  scenarios,  around  eight. 
In  contrast,  Principal  Components  (PC)  generally  requires 
a  subspace  dimension  equal  to  the  number  of  degrees  of  free¬ 
dom  taken  up  by  the  jammers  to  achieve  the  same  output 
power  level.  Each  narrowband  jammer  takes  up  one  degree 
of  freedom.  Each  wideband  jammer  takes  up  N  =  7  degrees 
of  freedom,  where  N  is  the  number  of  taps  per  antenna. 
This  is  because  the  cancellation  of  a  wideband  jammer  re¬ 
quires  a  spatial  null,  implying  a  null  across  the  entire  20.46 
MHz  spectrum  at  its  AOA.  In  Scenario  1,  the  jammers  take 
up  5  x  1  +  1  X  7  =  12  degrees  of  freedom;  in  Scenario  2,  the 
jammers  take  up  1x1  +  5x7  =  36  degrees  of  freedom. 

4.2.  Low  Sample  Support  Performance 

Figures  4  and  5  examine  the  space-time  snapshot  sample 
support  necessary  to  effectively  null  the  jammers  for  each 
of  the  two  scenarios  simulated.  The  power  output  for  each 
sample  support  level  was  averaged  over  250  Monte  Carlo 
trial  runs.  Each  reduced-rank  method  used  its  respective 
ideal  reduced  dimension  subspace  matrix  T  in  calculat¬ 
ing  the  power  output  at  each  snapshot.  Once  the  number 
of  snapshots  was  equal  to  the  rank  for  each  reduced-rank 
method,  the  power  output  was  calculated.  The  greatest 
differential  in  performance  between  the  MSNWF  and  PC 
based  methods  is  observed  in  Figure  5  corresponding  to  Sce¬ 
nario  2.  In  this  case,  Figure  3  and  the  above  calculation  dic¬ 
tate  that  PC  needs  to  adapt  in  a  36-dimensional  subspace, 
while  the  MSNWF  need  only  adapt  in  a  10-dimensional 
space.  This  allows  MSNWF  to  converge  more  rapidly  than 
PC  and  CS.  Note  that  the  MSNWF  is  able  to  null  the  jam¬ 
mers  effectively  at  low  ranks  with  the  added  advantage  of 
not  requiring  the  computation  of  eigenvectors. 

4.3.  Nulling  Performance/Distortion  Issues 

Figures  6  and  7  display  contour  plots  of  the  magnitude 
of  the  multi-dimensional  Fourier  Transform  of  the  space- 
time  weights  obtained  from  the  MSNWF  with  40  space-time 
snapshots.  For  Scenario  1,  Figure  6  displays  a  well-defined 
“point-null”  at  the  angle-frequency  coordinate  of  each  nar¬ 
rowband  jammer  and  a  well-defined  “fine-null”  along  the 
arrival  angle  of  the  wideband  jammer.  For  Scenario  2, 
Figure  7  displays  a  well-defined  “point-null”  at  the  angle- 
frequency  coordinate  of  the  one  narrowband  jammer  and 
a  well-defined  “line-null”  along  the  respective  arrival  angle 


of  each  of  the  five  the  wideband  jammers.  As  important, 
in  both  cases  the  response  of  the  space-time  beamformer  is 
observed  to  be  relatively  flat  away  from  the  null  locations. 

5.  CONCLUSION 

The  MSNWF  preprocessor  was  shown  to  exhibit  excep¬ 
tional  nulling  performance  for  both  wideband  and  narrow- 
band  jammers  at  low  sample  support  and  low  rank.  The 
reduced  dimension  subspace  selected  by  the  MSNWF  ex¬ 
hibits  rapid  convergence  in  rank  and  sample  support  imply¬ 
ing  adaptive  null  tracking  in  a  dynamic  jamming  environ¬ 
ment.  The  MSNWF  preprocessor  was  shown  to  outperform 
both  principal-components  and  cross-spectral  metric  while 
operating  at  a  lower  rank  and  sample  support. 


Table  1:  Simulation  Parameters 


Jammer  Type 

SNR 

AOA 

Scen.l 

AOA 
Seen.  2 

Bandwidth 

Wideband 

-100  dBW 

20® 

20° 

20  MHz 

Wideband 

-110  dBW 

— 

0° 

20  MHz 

Wideband 

-100  dBW 

— 

O 

O 

1 

20  MHz 

Wideband 

-100  dBW 

— 

1 

o 

o 

20  MHz 

Wideband 

-110  dBW 

- 

-60° 

20  MHz 

Jammer  Type 

SNR 

AOA 

AOA 

Frequency 

Narrowband 

-100  dBW 

60° 

60° 

-10  MHz 

Narrowband 

-100  dBW 

15° 

- 

-5  MHz 

Narrowband 

-100  dBW 

-10° 

— 

0  MHz 

Narrowband 

-100  dBW 

-30° 

— 

5  MHz 

Narrowband 

-no  dBW 

-55° 

- 

10  MHz 
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Power  Output  vs.  Rank 


Figure  2.  Power  Output  versus  Rank  (Scenario  1) 
Average  Power  Output  of  Preprocessor 


Figure  4.  Power  Output  versus  Snapshots  (Scenario  1) 
Contour  Plot  of  2D  DFT  of  Space-Time  Weights 


Theta  (Angle  of  Arrival) 

Figure  6.  Contour  Plot  of  2D  DFT  (Scenario  1) 


Power  Output  vs.  Rank 


Rank 

Figure  3.  Power  Output  versus  Rank  (Scenario  2) 
Average  Power  Output  of  Preprocessor 


Figure  5.  Power  Output  versus  Snapshots  (Scenario  2) 
Contour  Plot  of  2D  DFT  of  Space-Time  Weights 


Theta  (Angle  of  Arrival) 

Figure  7.  Contour  Plot  of  2D  DFT  (Scenario  2) 
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ABSTRACT 

Jammer  excision  enhances  interference  immunity  in  di¬ 
rect  sequence  spread  spectrum  (DSSS)  communications. 
In  this  paper,  we  propose  an  excision  procedure,  based 
on  the  discrete  evolutionary  and  Hough  transforms,  for 
jammers  composed  of  arbitrary  chirps.  The  proposed 
instantaneous  frequency  (IF)  estimation  is  done  locally, 
without  parameters,  and  it  is  recursively  corrected.  A 
singular  value  decomposition  (SVD)  of  the  dechirped 
signal  allows  us  to  synthesize  the  jammer  locally,  and 
then  subtract  it  from  the  received  signal.  Localized 
processing,  linearization  of  the  IF  estimate,  recursive 
correction,  and  no  problems  due  to  cross- terms  in  the 
time-frequency  distribution  or  in  matching  IF  models 
make  the  proposed  procedure  efficient  and  practical. 
Also,  SVD  provides  an  efficient  way  to  synthesize  the 
jamming  signal.  The  local  IF  estimation  and  the  per¬ 
formance  of  the  proposed  exciser  in  DSSS  systems  are 
illustrated. 

1.  INTRODUCTION 

Direct  sequence  spread  spectrum  (DSSS)  techniques 
provide  secure  communication  in  an  incresingly  crowded 
spectrum.  The  advantages  of  DSSS  are  achieved  by 
spreading  the  message  so  as  to  occupy  a  bandwidth 
in  excess  of  the  minimum  needed.  Despreading  at  the 
receiver  with  a  synchronized  replica  of  the  spreading 
function  permits  recovery  of  the  original  message  and 
reduces  interferences.  The  received  signal 

rk(n)  =  dkp(n )  +  jk(n)  +  g{n) 

is  composed  of  the  product  of  the  data  bits  dk  €  {-1,1} 
and  a  pseudo  white  noise  p(n)  €  {-1,1}  ,0  <  n  < 
(N  - 1) ,  a  jammer  jk  (n)  and  a  noise  g(n)  signals  added 
during  transmission.  Although  despreading  rk(n)  re¬ 
covers  the  original  message  while  it  spreads  the  jammer 
and  noise,  the  performance  of  a  DSSS  system  can  fail  if 
the  power  of  the  interferences  is  very  strong.  Excising 


the  jammers  before  despreading  enhances  the  interfer¬ 
ence  immunity. 

Different  methods  have  been  proposed  to  mitigate 
broad-band  jammers.  The  Wigner-Hough  (WH)  trans¬ 
form  method  [4]  characterizes  the  jammer  by  a  para¬ 
metric  model  of  its  IF.  Cross-terms  and  mismatching 
of  the  IF  model  hamper  the  method.  Time-varying 
filtering  and  masking  methods  based  on  bilinear  time- 
frequency  (TF)  distributions  [2,  7]  can  excise  jammers 
characterized  by  their  instantaneous  frequency,  band¬ 
width  and  their  support  in  the  TF  plane.  A  multires¬ 
olution  method  that  uses  a  chirplet  representation  [3], 
and  a  procedure  based  on  fractional  Fourier  transform 
[1]  have  also  been  proposed. 

The  authors  recently  presented  an  approach  [9]  for 
jammer  excision  based  on  the  discrete  evolutionary  trans¬ 
form  (DET)  [11]  and  the  Hough  transform.  In  this  pa¬ 
per,  we  exploit  the  advantages  of  local  IF  estimation, 
and  by  means  of  SVD  obtain  the  significant  singular 
values  containing  the  jammer  information.  The  local 
estimation  permits  us  to  consider  multiple  chirps  in  the 
jammer,  and  the  SVD  gives  excellent  synthesis  of  the 
jammer.  Subtracting  the  synthesized  jammer  from  the 
received  signal  before  despreading  enhances  the  inter¬ 
ference  immunity  considerably. 

2.  DISCRETE  EVOLUTIONARY-HOUGH 
TRANSFORM  (DEHT) 

2.1.  Malvar-based  DET 

A  non-stationary  signal  x(n)  can  be  expressed  as  a  sum 
of  overlapping  segments  Xi(n), 

i- 1 

x(n)  =  T>(n)  0  <  n  <  (N  —  1)  (1) 

i= 0 

where  I  is  the  number  of  segments  in  which  the  signal 
is  separated  by  Malvar  windows  {vi(n)}  having  sym¬ 
metrical  overlaps  at  the  partition  points  and  such  that 
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Hivi(n)  =  1  (see  Fig.l).  The  DET  relates  Xi(n)  and 
its  evolutionary  kernel  A)(n, a»f),[10,  11],  as: 

Li- 1 

Xi(n)  =  Y,  xi(n,oje)eJ“'n  (2) 

e=o 

N- 1 

Xt{n,ut)  =  Y  (3) 

m= 0 

where  u>(  =  j-L  The  (n,  £)  window  is  expressed 
in  terms  of  the  orthogonal  Malvar  wavelets  {uit(n)  = 
Vi(n)fie(n)}  as 

W^{n,t)  =  uu(m)uu(n)ej“'{m-n)  (4) 

The  functions  {fu(n)}  are  extensions  of  the  orthogonal 

functions  {fuin)}  given  by 


fu(n)  =  y^cos  (jr-(£  +  0.5)  (n  -  0<£<  (Li-1) 


The  Malvar  expansion  of  Xi  (n),  [10],  is  given  by 


Li  — 1 

Xi(n)  =  Y  c«u«(n)  (5) 

e=o 

where  the  coefficients  cu  are  obtained  by  means  of  the 
orthogonality  of  the  Malvar  wavelets.  The  partition 
lengths  {Li}  can  be  chosen  independently.  The  crite¬ 
rion  proposed  in  [5]  find  them  by  an  entropy  minimiza¬ 
tion.  To  consider  different  types  of  chirps,  including 
sinusoids,  we  will  extend  their  criterion. 


Figure  1:  Typical  Malvar  window  function 


2.2.  Optimal  Windowing 

To  improve  the  entropy-based  selection  of  the  window 
lengths,  we  will  use  a  zero-crossing  rate  and  the  rank  of 
a  Hankel  matrix  generated  from  the  signal.  These  ad¬ 
ditional  measures  are  especially  useful  when  sinusoids 
are  used  as  jammers. 

As  proposed  in  [5],  minimizing  the  information  cost 
L,-— 1 

=  -  Y  M2  logical2 

f=0 


is  equivalent  to  minimizing  the  entropy  of  the  {ci  t}. 
In  some  situations,  the  obtained  partition  can  be  im¬ 
proved  by  using  the  frequency  content  of  the  segment 
being  considered.  Such  information  can  be  obtained  by 
a  zero-crossing  rate: 

1  JV_1i 

Zi=  2l  Y1  jsgnfcwiH]  -  sgn[xWj(n  -  1)] 

1  n=l 


where  xw,  (fi)  =  x(n)w,(n),  and  Wj(n)  is  a  rectangular 
window  of  length  Li  on  (a,,aj+i]  and  sgn[.]  is  the  sign 
function.  The  rank  of  a  Hankel  matrix,  [6],  of  :cWt  (n) 
denoted  as 


a:Wi(0)  £Wi(l)  xWi(c— 1) 

xWi(l)  ®w,-(2)  •••  zWi(c) 


xWi{Li-c)  xw.(Li-c+ 1)  xWi(Li- 1) 


with  (Li  —  c  +  1)  rows  and  c  columns,  provides  an  es¬ 
timate  of  the  number  of  sinusoids  in  the  segment.  The 
value  c  is  chosen  greater  than  2r,,  where  r,  is  the  num¬ 
ber  of  sinusoidal  components  in  the  signal  windowed 
by  w i(n).  After  computing  the  above  three  measures 
in  each  partition,  we  use  them  to  merge  or  split  win¬ 
dows  to  obtain  the  best  possible  partition  of  x(n). 


2.3.  Discrete  Evolutionary-Hough  Transform 

Once  the  optimal  lengths  are  chosen,  upsampling  is 
performed  to  achieve  uniform  frequency  resolution  in 
all  segments.  This  is  achieved  by  letting  K  be  the  least 
common  multiple  of  the  lengths,  i.e.  K  =  LCM{L,  }i6[0 /_ij. 
With  identical  frequency  resolutions  for  every  overlapped 
interval,  the  discrete  evolutionary  kernel  X(n,Wk),  and 
the  spectrum  S(n,u>k)  of  the  signal  x(n)  are  given  by 


/-i 

X(n,ujk)  =  YXi(n’u*) 

i= 0 

S(n,uk)  =  |  X(n,uk)\2 

The  discrete  evolutionary- Hough  transform  (DEHT) 
of  x-Wi(n),  [9],  is  given  by 

DEHT(0,p,i)  =  Yj^(nTwk)^i(n)5(p-ncos9-u>k  sin#) 

71,  k 

where  <5(-)  is  the  Dirac  delta  function.  The  DEHT  pro¬ 
vides  a  linear  estimate  -  characterized  by  the  parame¬ 
ters  ( p ,  6)  -  of  the  IF  of  each  of  the  chirps  in  a  segment. 
The  direct  distance  parameter  p  is  the  distance  between 
the  line  appearing  in  5(n,w*.)wj(n)  and  the  origin.  The 
inclination  parameter  9  is  the  angle  between  p  and  the 
n-axis. 
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3.  INSTANTANEOUS  FREQUENCY 
ESTIMATION  VIA  DEHT 


4.  APPLICATION  TO  JAMMER  EXCISION 
IN  DSSS  COMMUNICATIONS 


A  problem  in  IF  estimation  using  the  Wigner-Hough 
transform  is  the  mismatching  between  the  actual  IF 
and  the  used  parametric  models  used  for  it.  The  lin¬ 
ear  IF  estimator  developed  in  [9]  is  local,  recursive, 
and  non-parametric,  and  valid  for  mono  and  multi- 
component  signals. 

Applying  the  Hough  transform  to  the  Malvar-based 
evolutionary  spectrum  of  Xi(n )  results  in  a  piecewise 
linear  characterization  of  the  IF.  This  is  due  to  the  ap¬ 
proximate  sinusoidal  representation  obtained  locally.  It 
is  also  possible  to  recursively  correct  the  estimate  by 
processing  the  signal.  Let  <j>iq(n)  be  the  initial  instan¬ 
taneous  phase  estimated  for  the  g<ft-component  of 

*w>)  =  £Aipe^(n). 

p 

To  improve  the  estimate  <f>iq{n),  we  first  dechirp  iw,  (n) 
with  it  to  get, 

yj  (n)  =  sw,(»0e-^<*(B) 

_  ^  ^ipei[0ip(n)-0id(n]) 

Low-pass  filtering  y f  (n)  with  a  narrow-band  filter  gives 
zf(n)  =  Aiqej^n)-^in)]  +e(n) 

—  Aiqe^iq{'n'1 

where  e(n)  is  a  small  output  with  frequencies  outside 
the  filter  bandwidth.  Letting  (j>iq{n )  be  the  phase  of 
the  filter  output  zf(n),  we  then  obtain  an  improved 
estimate  as 

</>ignew(n)  =  <Mn)  + 

We  can  then  use  this  new  estimate  to  dechirp  xWi  (n) 
again  and  find  an  improved  estimate,  repeating  the  pro¬ 
cess  until  the  difference  between  the  old  and  the  new 
estimates  is  insignificant  The  final  estinate  is  then  lin¬ 
early  fitted.  The  procedure  is  done  locally  and  recur¬ 
sively  for  each  of  the  signal  components. 

The  performance  of  the  local  IF  estimator  is  illus¬ 
trated  by  considering  first  a  sinusoidal  FM  signal.  Its 
spectrum  and  the  IF  estimate  are  shown  in  Fig.  2(a)- 
(b);  the  final  IF  estimate  (solid  line)  is  very  close  to  the 
actual  one  (dotted  line).  The  dash-dotted  line  in  the 
Fig.  2(b)  is  the  piecewise  linear  IF  obtained  from  the 
DEHT.  As  a  second  example,  consider  a  signal  consist¬ 
ing  of  a  linear  and  a  sinusoidal  FM  signals,  with  differ¬ 
ent  constant  amplitudes,  embedded  in  noise  (SNR  2.64 
dB).  The  resulting  estimates  (solid  line)  are  consid¬ 
erable  improved  over  the  initial  estimates(dash-dotted 
line)  as  seen  in  Fig.  2(c)-(d). 


As  suggested  in  [7],  if  the  jammer  is  synthesized  and 
subtracted  from  the  received  signal  the  performance  of 
the  despreading  in  DSSS  is  enhanced  considerably.  As¬ 
suming  the  jammer  is  composed  of  an  arbitrary  number 
of  chirps  with  smooth  IFs  and  time- varying  amplitudes 
changing  slowly  in  time,  the  proposed  exciser  is  the  one 
displayed  in  Fig.  3.  After  an  estimate  of  the  IF  of  one 
of  the  jammer  components  is  obtained,  this  jammer 
component  can  be  approximately  synthesized  by  either 
low-pass  filtering,  or  SVD  of  a  modified  Hankel  matrix 
in  the  ith  segment.  The  modified  Hankel  matrix  Mf ,  of 
dimension  L»/ 2  x  (L;  +  2)/2,  can  be  expressed  in  term 
of  Hj  (generated  from  )  =  rfc(n)wj(n))  as 

Mf  =  H  te-M'W  (6) 

The  rank  of  the  modified  Hankel  matrix  determines  the 
number  of  chirps  with  the  instantaneous  phase  <piq(n) 
present  in  the  signal.  The  most  significant  singular 
values  will  permit  us  to  obtain  a  good  synthesis  of  the 
jammer  component  after  it  is  chirped  using  the  esti¬ 
mated  IF. 

The  number  of  significant  singular  values  chosen  is 
crucial  in  the  jammer  synthesis.  To  decide  the  number 
of  singular  values  we  use  as  a  criterion  a  percentage  of 
the  energy  of  the  dechirped  received  signal.  Let  Eqik 
be  the  energy  of  rifc(n)e-^i^n),  {crTO}  be  the  singular 
values  of  Mf  and  {em}  be  the  eigenvalues  of  (Mf)*Mf 
(the  symbol  *  stands  for  the  conjugate  transpose).  The 
matrix  (Mf)*Mf  is  symmetric  with  diagonal  entries 
dm,  m  =  0,...,Li/2.  Observing  that  the  sum  of  the 
first  and  the  last  diagonal  entries  of  this  symmetric 
matrix  equals  to  the  energy  of  the  dechirped  signal  we 
have 


Efk  =  do  +  di± 

Li/2  —  1 

=  Trace[(Mf)*Mf]  -  dm  (7) 

in— l 


Given  that  Trace[(Mf)*Mf]  =  Yhm  em  [®]>  an(^  that 
=  em,  then  Eq.  (7)  becomes 


Eqik 


Li/2 

E e™ 

m= 0 


ii/2-l 

E  dm 

m— 1 


Li/2 

E 


771—0 


Li/ 2-1 

-  E  dm 

m=  1 


(8) 


Li/2-l 

If  we  let  J2  drn  =  0Eqik  for  some  constant  (3,  then 

771=1 
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Eq.  (8)  can  be  rewritten  as 


,  U/2 

=  TTa  E  «i  0>) 

"  m=0 

Once  (3  is  computed,  the  effect  of  chosing  a  certain 
number  of  singular  values  can  be  measured  in  terms 
of  their  contribution  to  the  energy  of  the  dechirped 
received  signal. 

To  assess  the  performance  of  our  procedure,  we  sim¬ 
ulated  DSSS  received  signals  using  127  chips/bit,  for 
various  SNRs  and  two  fix  JSRs  (jammer  to  signal  ra¬ 
tios).  The  jammer  is  composed  of  a  linear  and  a  sinu¬ 
soidal  FM  signals  with  different  constant  amplitudes 
just  as  in  Fig.  2.  Figures  4-5  show  the  probabilities 
of  bit  error  when  the  received  signal  is  jammed  with 
JSRs  of  26  and  34  dB.  We  consider  4  cases:  when  the 
received  signal  is  without  jammers;  when  no  excision 
is  performed  before  despreading;  when  using  a  lowpass 
filter,  and  when  using  the  SVD.  The  results  are  im¬ 
proved  after  excising  using  the  lowpass  filter  and  SVD. 
However,  the  SVD  method  performs  better  in  the  case 
of  stronger  jammers,  as  is  shown  in  Fig.  5.  Figure  6(a)- 
(b)  compares  the  excised  signals  using  a  low-pass  filter 
and  the  SVD  method  to  the  received  signal  without 
jammers.  As  shown  in  Fig.  6(b),  the  low-pass  filter 
method  displays  a  large  ripple  at  the  boundaries  of  the 
segments. 

5.  CONCLUSIONS 

Excision  of  a  multi-component  chirp  jammer  in  DSSS 
communications  is  achived  by  subtracting  a  synthesized 
version  of  it  from  the  received  signal.  The  jammer  syn¬ 
thesis  can  be  done  by  using  lowpass  filtering  or  SVD 
having  estimates  for  the  IFs.  The  IF  estimator  uses  the 
DEHT  to  obtain  a  piecewise  linear  estimate,  which  can 
then  be  recursively  corrected.  From  the  results,  it  is 
shown  that  SVD  seems  to  consistently  perform  better 
that  lowpass  filtering  in  the  case  of  stronger  jammers. 
Applying  this  excision  procedure  to  frequency  hopping 
spread  spectrum  systems  will  be  further  investigated. 
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mates;  DEHT  (dash-dotted),  second  iteration  of  correction  (solid), 
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Figure  3;  Diagram  of  the  exciser  using  local  instantaneous  fre¬ 
quency  estimation  and  lowpass  filter  or  SVD 
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Figure  4:  The  probabilities  of  bit  error  versus  SNRs  under  JSR  of 
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Figure  5:  The  probabilities  of  bit  error  versus  SNRs  under  JSR  of 
34  dB 
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ABSTRACT 

In  this  paper  we  address  the  problem  of  navigation  data 
demodulation  by  an  adaptive  GPS  receiver  that  utilizes  a 
bank  of  single-satellite  linear-tap-delay  filters  and  employs 
antenna-array  reception.  The  presence  of  an  antenna  ar¬ 
ray  allows  the  receiver  to  operate  in  the  spatial  domain 
in  addition  to  the  temporal  (code)  domain.  We  investigate 
disjoint-domain  as  well  as  joint-domain  space-time  GPS  sig¬ 
nal  processing  techniques  and  we  consider  design  criteria  of 
conventional  matched-filter  (MF)  type,  minimum-variance- 
distortionless-response  (MVDR)  type  and  auxiliary-vector 
(AV)  type.  The  proposed  structures  utilize  filters  that  oper¬ 
ate  at  a  fraction  of  the  navigation  data  bit  period  (1  msec) 
and  are  followed  by  soft-decision  detectors.  Soft  decisions 
taken  over  a  navigation  data  bit  period  are  then  combined 
according  to  a  simple  combining  rule.  Simulation  results 
illustrate  the  bit-error-rate  (BER)  performance  of  the  in¬ 
vestigated  design  alternatives. 

1.  INTRODUCTION 

The  Global  Positioning  System  (GPS),  originally  developed 
for  military  use,  has  received  a  lot  of  attention  recently  for 
use  in  civilian  applications  such  as  aviation,  agriculture, 
land-vehicle  navigation,  surveying  and  mapping,  to  name 
a  few  [1],  [2].  The  GPS  system  employs  direct-sequence 
spread-spectrum  (DS-SS)  signaling.  Each  satellite  is  as¬ 
signed  a  coarse  acquisition  ( C/A )  code  and  a  precision  ( P ) 
code.  The  C/A- code  is  a  Gold  sequence  with  chipping  rate 
at  1.023  Mchips/sec  and  period  1  msec  (or  code-length  1023 
chips),  while  the  P-code  is  a  pseudorandom  code  with  chip¬ 
ping  rate  at  10.23  Mchips/sec  and  period  one  week.  The 
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C/A  and  P  code  are  modulated  by  binary  navigation  data 
(at  50  bps)  and  then  multiplexed  in  phase  quadrature  to 
form  the  satellite  signal.  In  this  paper  we  focus  on  the 
C/A  component  of  the  transmitted  GPS  signal  (or  as  com¬ 
monly  stated,  we  assume  perfect  separation  of  the  C/A  and 
P-signal  component  at  the  receiver). 

The  working  principle  of  the  GPS  system  is  very  sim¬ 
ple.  The  position  of  a  GPS  receiver  is  modeled  as  a  four¬ 
dimensional  vector  (three  coordinates  correspond  to  the 
spatial  position  of  the  receiver  and  one  is  related  to  re¬ 
ceiver  timing).  Estimation  of  the  position  can  be  achieved 
by  utilizing  the  signals  of  a  minimum  of  four  satellites. 

The  signal  captured  by  a  GPS  receiver  is  the  aggre¬ 
gate  of  the  GPS  signals  of  the  satellites  that  are  currently 
in  view,  their  multipaths,  additive  white  Gaussian  noise 
(AWGN),  possible  intelligent  hostile  spread  spectrum  (SS) 
interference  (spoofing)  and/or  narrowband  interference. 
The  component  of  the  received  signal  that  is  due  to  the 
GPS  signals  of  the  satellites  currently  in  view  is  the  su¬ 
perposition  of  very  low  correlated  SS  signals.  However  an 
earth-based  intelligent  SS  interferer/spoofer,  who  knows  the 
satellite  C/4-code  can  mimic  the  signal  of  interest  and  thus 
contribute  highly  correlated  (with  signal  of  interest)  inter¬ 
ference. 

In  this  paper  we  address  the  problem  of  navigation  data 
demodulation  by  an  adaptive  GPS  receiver  that  utilizes 
a  bank  of  single-satellite  linear-tap-delay  filters  and  em¬ 
ploys  antenna-array  reception.  The  presence  of  an  antenna- 
array  allows  the  receiver  to  operate  in  the  spatial  domain 
in  addition  to  the  temporal  (code)  domain.  We  investigate 
disjoint-domain  as  well  as  joint-domain  space-time  GPS- 
signal  processing  techniques  and  we  consider  design  criteria 
of  the  form  of  conventional  matched-filter  (MF)  as  well  as 
interference  suppressing  minimum-variance-distortionless- 
response  (MVDR)  [3]  and  auxiliary-vector  (AV)  filtering 
[4].  The  proposed  structures  utilize  filters  that  operate  at 
one  twentieth  of  the  navigation  data  bit  period.  Soft  pre¬ 
detection  measurements  taken  over  a  navigation  data  bit 
period  are  then  combined  according  to  a  simple  combining 
rule  for  further  BER  performance  improvements. 

2.  SYSTEM  MODEL 

The  C/A  and  P-signal  component  are  assumed  to  be  per¬ 
fectly  separated  at  the  receiver.  Then,  the  aggregate  C/A 
component  of  the  received  signal  can  be  viewed  as  an  asyn¬ 
chronous  DS-SS  system  with  K  SS  signals  in  the  presence 
of  AWGN.  Since  the  satellite  navigation  information  bit 
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rate  is  50  bps,  it  is  equivalent  to  say  that  the  C/A  code 
of  each  GPS  signal  repeats  itself  20  times  and  during  this 
period  it  is  modulated  by  the  same  data  bit.  In  this  con¬ 
text,  the  contribution  of  the  fcth  transmitted  GPS  signal, 
fc  =  1  over  T  =  LTC  secs  is  given  by  uk{t)  = 

Y.MVy/E'kSkit  -  iT)e^\  where  bk(i)  €  {-!,  +!}  is 
the  ith  transmitted  data  bit  such  that  bk(i)  —  bk(i  +  m), 
m  =  1,  •••,19,  and  bk(i)  is  independent  of  bk{j)  for  all 
| i  -  j\  >  20.  Also,  Ek  and  cj>k  denote  the  energy  and  the 
carrier  phase,  respectively,  of  the  fcth  signal  (with  carrier 
frequency  /c),  while  sk  is  the  normalized  fcth  C/A  code  given 
by  sk(t)  =  i2t=o  ~  lTc),  where  dk{l)  €  {±1/%/!}, 

l  =  0,  •  •  • ,  L  -  1,  are  the  signature  bit  values  of  the  fcth  SS 
signal  and  ip(t)  is  the  chip  waveform  of  duration  Tc  =  T/L. 

The  aggregate  signal  received  at  the  input  of  a  narrow- 
band  uniform  linear  array  of  M  antenna  elements  is  given 
by  xc(f)  =  Ef=i  Mt  -  n)*k  +  n(t),  where  n(t)  is  complex 
AWGN  and  a*,  denotes  the  array  response  vector  (spatial 
signature)  of  the  fcth  SS  signal  with  elements  defined  by 

ak(m)  =  m  =  1,-  •  • ,  M  (in  which  0k  de¬ 

notes  the  angle  of  arrival  of  the  fcth  signal,  A  is  the  carrier 
wavelength,  d  is  the  inter-element  spacing- usually  d  =  A/2). 

After  carrier  demodulation,  the  received  signal  is  given 
by 

J  K 

x(f)  =  £X>(i)^(t  -  IT  -  rOe—'-a*  +  "(*)• 

‘  k=1  (1) 
Without  loss  of  generality,  we  assume  chip  synchronization 
at  the  reference  antenna  element  (m  =  1)  with  the  SS  signal 
of  interest,  say  Signal  1,  and  we  also  assume  that  0  <  rk  < 
T,  k  =  2,  •  •  • ,  K.  After  conventional  chip  matched  filtering 
and  sampling  at  the  chip  rate  1  /Tc,  we  can  visualize  the 
space-time  data  samples  associated  with  b\  (i)  in  the  form 
of  an  M  x  L  matrix  Xnfxt(i)  =  [xMxi(*h)  xmx i(*L  + 
1)  •  •  -  XMxi{iL+L-l)]  where  the  column  vector  xmxi(*L+ 
j)  is  given  by 

K 

XMxi(iL  +  j)  =  y^bk{i)VEkdk{j  -rk/Tc)e  j27rfcTkak 

k= 1 

+nMxi(iI<  +  j),  j  =  0,  -- ,L-  1.  (2) 

In  the  following,  we  pursue  one-shot  detection  of  the  bit 
of  interest  bi(i),  and  we  drop  the  index  i  for  simplicity  in 
notation. 

In  this  work  we  focus  on  the  detection  of  the  naviga¬ 
tion  data  of  a  single  satellite.  In  this  context  GPS  signals 
of  other  satellites  currently  in  view  as  well  as  intelligent 
SS  “GPS-looking”  jamming  signals  are  treated  comprehen¬ 
sively  as  SS  interference.  The  differences  between  the  for¬ 
mer  and  the  latter  lie  in  their  corresponding  signature  cross- 
correlation  level  with  the  satellite  signal  of  interest  as  well 
as  their  power  level. 

3.  SPACE  AND  TIME  PROCESSING 
ALTERNATIVES 

In  this  section  we  investigate  disjoint  and  joint  domain  fil¬ 
tering  configurations  for  GPS  signal  processing.  The  dis¬ 
joint  configurations  are  formed  by  the  cascade  of  a  space 
filter  followed  by  a  time  filter  (S-T),  or  by  a  time  filter 
followed  by  a  space  filter  (T-S).  In  this  context,  disjoint 


domain  receiver  design  is  a  two  stage  process  with  the  sec¬ 
ond  stage  being  conditioned  on  the  design  of  the  first  stage. 
The  design  optimization  criterion  imposed  at  either  stage 
(regardless  of  the  space  or  time  nature  of  the  corresponding 
filter)  that  is  of  interest  in  this  work  is  of  MF-type,  MVDR- 
type  or  AV-type.  The  design  optimization  criterion  for  each 
stage  can  be  selected  independently  and  is  usually  dictated 
by  considerations  of  simplicity  in  implementation,  compu¬ 
tational  complexity  and  performance.  Due  to  the  lack  of 
space  we  present  only  the  studies  for  S-T  configurations. 


3.1.  Disjoint  Space-Time  (S-T)  Configuration 

For  an  arbitrary  linear  space  processor,  fs,  and  an  arbitrary 
linear  time  processor,  ft,  the  decision  on  the  information  bit 
of  the  signal  of  interest  6i  is  given  by  the  following  expres¬ 
sion: 

6,  =  sgn(Re{{tH(f?X)T})  =  sffn(Re{fsHXft*})  (3) 

where  sgn(-)  identifies  the  sign  operation  and  Re{  }  ex¬ 
tracts  the  real  part  of  a  complex  number.  In  the  following 
we  present  the  different  forms  that  f3  and  ft  may  assume. 

Spatial  Matched- Filtering  (sMF) 

The  spatial  matched  filter  is  the  filter  matched  to  the  array 
response  vector,  ai,  of  the  signal  of  interest,  that  is 

f»MF=ai/Af.  (4) 

We  observe  that  Ebl{X(bidi)}  =  y/Ei&i,  where  di  is  the 
code  of  the  signal  of  interest  and  the  statistical  expectation 
operation  E{  }  is  taken  with  respect  to  In  only. 

□ 

Spatial  MVDR  Filtering  (sMVDR) 


The  MVDR  filter  is  designed  to  minimize  the  filter  output 
variance  subject  to  the  constraint  that  the  filter  remains 
distortionless  in  the  direction  of  the  signal  of  interest  ai  [3]. 
The  filter  is  given  by 

R_1 


f. 


MVDR  — 


- 2L_  (5) 

afR7xai 

where  Rs  denotes  the  M  x  M  covariance  matrix  of  the 
columns  of  Xmxl  (i.e.,  the  correlation  of  the  spatial  input 
data). 

□ 

Spatial  Auxiliary-Vector  Filtering  (sAV) 

For  a  given  spatial  covariance  matrix  Rs,  the  theory  of  aux¬ 
iliary  vector  (AV)  filtering  [4],  [5]  can  be  applied  to  the 
spatial  domain  to  provide  a  sequence  of  spatial  linear  filters 
that  are  distortionless  in  the  vector  direction  of  interest  ai 
and  can  be  obtained  by  the  following  recursion: 

f.Av(0)  =  7™  (6) 


'llaill2 
For  n=l,  2,  •  •  • 

gs(n)  =  Raf,Av(n-l)  - 

,  .  gf (n)Rsf,Av(n-l) 
g?(n)R. sgs(n) 

rt 

f.Av(u)  =  LAv(0)  —^2  l^s(i)&s(i) 


af  Rsf.Avfo— l)ai 


ai 


(7) 

(8) 
(9) 
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As  shown  in  [5],  given  infinite  data,  the  sequence  of  auxiliary 
vector  filters  converges  to  {,Mvdr  (f.Av(n) as 
71— too). 

□ 

Finally,  the  time  processor  of  the  S-T  configuration  is  a 
linear  filter  of  dimension  Lx  1  that  takes  as  input  the  space- 
processor  output.  Similar  to  the  optimization  criteria  used 
in  the  design  of  the  space  processor,  the  temporal  filter  can 
be  of  MF-type,  MVDR-type  or  AV-type. 

Temporal  Matched-Filtering  (tMF) 


The  temporal  MF  is  given  by 


ftMF  —  dl  (10) 

or  equivalently,  ftMF  =  Ebl  {yfq  },  where  y  is  the  output  of 
the  spatial  processor. 


Temporal  MVDR-Filtering  (tMVDR) 


□ 


The  temporal  MVDR  filter  can  be  shown  to  be  equal  to 


* tMVDR  — 


dfRf'di 


(11) 


where  R(  is  the  Lx  L  covariance  matrix  of  the  spatial  filter 
output  y  and  thus  it  depends  on  the  type  of  first-stage 
spatial  processing. 


Temporal  Auxiliary- Vector  Filtering  (tAV) 


O 


The  sequence  of  auxiliary  vector  filters  in  the  time  domain 
can  be  obtained  as  follows, 

ft/4v(0)  =  di  (12) 

For  ti  =  1,  2,  ■  •  • 

gt(7i)  =  RtftAV(7i-l)  -  df  R(W(n— l)d!  (13) 

/  v  gtg(n)RtfMv(Ti-l) 

M  j  g"(7i)Rtgf(n) 


ft.4  V  (n)  —ftAV  (0)  ~  y^M<(7)gt(7) 


(14) 

(15) 


and  Rt  denotes  the  covariance  matrix  of  the  spatial  filter 
output  y.  The  following  theorem  provides  a  performance 
comparison  in  terms  of  output  signal-to-interference-plus- 
noise-ratio  (SINR)  of  the  configurations  presented  above  for 
two  SS  signal  case.  The  proof  is  omitted  due  to  the  lack  of 
space. 


Theorem  1  Let  a;  and  d,  be  the  spatial  and  temporal  sig¬ 
nature  of  user-i,  i  =  1,2,  of  length  M  and  L,  respectively. 

Define  p  =  d^1  d2,  77  =  |  -  ■ 1Maa  | .  Let  also  Ei  denote  the 
signal-to-noise-ratio  (SNR)  of  user-i,  and  SINR(,xx/tYY) 
or  SINR(iyy/sXx)  denote  the  output  SINR  of  an  S-T  or 
T-S  configuration,  that  utilizes  an  xx-type  space  filter  and 
a  YY-type  time  filter.  Then 

A.  Space-Time  configuration 

(i)  SINR(sMF/tMF )  <  SINR(sMF/tMVDR),  for  any  p,r\ 

(ii)  SINR(smv DR/tM f)  <  SINR(sMVDR/tMV dr),  for  any  p,r) 
(in)  A  loose  sufficient  condition  for  SINR(sMVDR/tMVDR.) 


<  SINR(sMF/ tMVDR)  is 


l  +  M£2772(l-p2)>[l  + 


ME2rf{\  -p2) 


1  +  ^(2  + 


[l  +  ^l-T,2)]2. 


L 

2\i2 


0(i 


B.  Time-Space  configuration 


(i)  SINR(tMF/sMF)  <  SINR(tMF/sMVDR),  for  any  p,ri 

(ii)  SINR(tMV d r/ bM f)  <  SINR(tMVDR/sMVDR ),  for  any  p,r 7 
(in)  A  loose  sufficient  condition  for  SINR^mvdr/sMvdr) 

<  SINR(tMF/sMVDR)  is 


1  +  ME2p2(l  —  772)>[1  + 


ME2P2(l-r,2)) 

1  +  E2(2  +  E2)(l  -  p2)1 


[1+^2(1 


V)]2- 


(17) 


C.  Space-Time  versus  Time-Space  configuration 

(i)  SINR(sMF/tMF)  —  SINR(tMF/sMF),  for  any  p,  77 

(ii)  SINR(sMv DR/tMF)  <  SINR(tM f/sMvdr),  for  any  79,77 
(Hi)  SINR(tMv dr/sM f)  <  SINR(sM F/tMV d r),  for  any  p,r) 

□ 


3.2.  Joint  Domain  Filtering 

In  joint  domain  processing,  to  avoid  cumbersome  2-D  oper¬ 
ations  and  notations,  we  vectorize  the  matrix  Xmxz.  by 
stacking  all  columns  in  the  form  of  a  vector  Xmlxi  = 
Vec{XMxi.  }.  In  the  following,  X  denotes  the  joint  space- 
time  data  in  the  CML  complex  vector  space  that  constitutes 
the  input  to  a  joint  space-time  linear  filter  w  to  be  designed 
according  to  MF,  MVDR  or  AV  processing  principles. 

Joint  Domain  Matched  Filtering  (JMF) 

The  joint  space-time  matched  filter  wJMF  for  the  signal 
of  interest  ( Signal  1)  is  equal  to  the  joint  space-time  sig¬ 
nature  of  the  signal  of  interest,  i.e.,  the  Kronecker  product 
vi  =  (di  ®ai)/M,  where  di  and  ai  are  the  temporal  signa¬ 
ture  and  spatial  signature  (steering  vector)  of  the  signal  of 
interest,  respectively  (JMF  is  equivalent  to  sMF/tMF  and 
tMF/sMF  configurations). 

We  note  that  JMF  is  optimum  only  when  the  channel 
interference  plus  noise  is  white  Gaussian  which  is  not  the 
case  for  most  practical  SS  communication  systems.  Indeed, 
non-orthogonal  multiple  access  interferers  as  well  as  highly 
correlated  (with  the  signal  of  interest)  intentional  jammers 
may  render  the  JMF  receiver  obsolete.  A  remedy  for  the 
latter  situation  is  to  proceed  with  the  design  of  interference 
suppressing  receivers  such  as  the  MVDR  receiver  or  the  AV 
receiver  that  are  presented  below. 

Joint  Domain  MVDR  Filtering  (JMVDR) 


The  joint  domain  MVDR  filter  is  designed  to  minimize  its 
output  energy  and  simultaneously  be  distortionless  toward 
the  joint  space-time  signature  of  the  signal  of  interest  Vi . 
It  is  given  by  the  following  expression: 


Yi  JMVDR  = 


R_1Vl 

/fR-Wi 


(18) 


where  R  =  E{XXHj  is  the  covariance  matrix  of  the  space- 
time  input  data  vector. 
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Joint  Domain  Auxiliary- Vector  Filtering  (JAV) 


Joint  domain  auxiliary-vector  filter  design  provides  a  se¬ 
quence  of  joint  domain  auxiliary  vector  filters  that  are  dis¬ 
tortionless  toward  the  joint  space-time  signature  of  the  sig¬ 
nal  of  interest  Vi  and  can  be  obtained  by  the  following 


recursion: 

Wjav(0)  = 


Vl 

IMP 


(19) 


For  ?i=l,  2,  •  •  • 

,  x  „  ,  vf Rwj^v(n-l)vi  /nnS 

g(n)=Rwwv(u-l) - jj^jja -  (20) 


(  \  gH(n)Rw-'^v(n~1) 
m)~  g«(n)Rg(n) 

n 

Wjw(n)=wJAv(0)  -  y2 M0s(») 

i=l 


(21) 

(22) 


3.3.  GPS  Filter  Output  Combining 

To  take  advantage  of  the  redundancy  introduced  by  uti¬ 
lizing  C/A  codes  of  period  equal  to  a  fraction  of  the  in¬ 
formation  bit  period,  and  still  maintain  low-order  filtering 
(equal  to  the  C/A  code  length),  combining  methods  such  as 
selective  combining  (SC),  equal  gain  combining  (EGC)  or 
maximum  ratio  combining  (MRC)  can  be  used  to  further 
improve  the  receiver  BER  performance.  In  this  paper  we 
utilize  EGC  because  of  its  simplicity  in  implementation. 

3.4.  Filter  Estimation  Considerations 

The  developments  so  far  involved  filter  alternatives  of  MF , 
MVDR  and  AV  type.  Besides  the  fixed  MF-type  struc¬ 
ture,  all  others  are  adaptive  in  nature  and  have  been  pre¬ 
sented/formulated  under  ideal  conditions,  that  is,  under  the 
assumption  that  the  space  or  time  or  space-time  covariance 
matrix  R  involved  is  known.  In  practice,  however,  R  is  un¬ 
known  and  it  is  sample-average  estimated  by  a  data  record 
of  finite  size.  When  R  is  substituted  by  the  sample-average 
estimated  R  then  the  ideal  receiver  expressions  in  (5),  (9), 
(11),  (15),  (18)  and  (22)  assume  their  estimated  versions. 
In  this  context,  the  AV  algorithm,  produces  a  sequence  of 
MVDR  filter  estimators  of  the  form  w(0),  w(l),  •  •  •.  This 
sequence  has  been  extensively  studied  in  [5]  and  shown  to 
offer  the  means  for  effective  control  over  the  filter  estimator 
bias  versus  covariance  trade-off.  As  a  result,  adaptive  filter 
estimators  from  this  class  have  been  seen  to  easily  outper¬ 
form  in  mean-square  estimation  error  the  (constraint)  LMS, 
sample-matrix-inversion  (SMI)  and  RLS  type  adaptive  fil¬ 
ter  implementations.  These  operational  characteristics  of 
the  AV  filter  estimators  place  them  favorably  in  terms  of 
GPS  receiver  implementation  when  interference  suppression 
with  short  data  records  is  the  objective.  Simulation  com¬ 
parisons  in  the  following  section  illustrate  how  the  above 
observations  translate  into  superior  BER  performance. 

4.  NUMERICAL  AND  SIMULATION 
COMPARISONS 

We  consider  the  GPS  signal  model  in  (1)  for  a  system  with 
M  =  2  antenna  elements  and  spreading  gain  L  =  1,023. 


In  all  cases  we  assume  the  presence  of  4  satellite  signals 
with  fixed  C/A  Gold  codes,  as  well  as  the  presence  of  one  or 
two  high  power  spread  spectrum  jammers  (spoofers)  that 
exhibit  code  cross-correlation  with  the  signal  of  interest 
(Signal  1)  approximately  0.1  and  0.2,  respectively.  The 
angles  of  arrival  of  the  satellite  and  jamming  signals  are 
randomly  generated  according  to  a  uniform  distribution  in 
(-tt/2,  tt/2). 

The  simulation/numerical  studies  in  this  section  evalu¬ 
ate  the  BER  performance  of  the  GPS  receiver  as  a  function 
of  either  the  SNR  of  the  signal  of  interest  or  the  data  record 
size.  All  BERs  are  analytically  evaluated  and  the  results  are 
averages  over  100  independent  space-time  channels. 

For  the  BER  versus  SNR  studies,  the  signal  of  inter¬ 
est  SNR  varies  from  0  dB  (weak  signal)  to  15  dB  (normal 
strength  signal).  The  SNRs  of  the  other  satellite  signals  are 
fixed  at  15  dB  while  the  jammer’s  SNR  is  fixed  at  30  dB 
and  the  AWGN  variance  is  taken  equal  to  1.  In  Figure  1 
the  BER  versus  the  SNR  of  Signal  1  is  shown  for  different 
disjoint  and  joint  estimated  receiver  configurations  in  the 
presence  of  one  high  power  SS  jammer. 

Figures  2-4  plot  the  BER  versus  the  data  record  size  in 
the  presence  of  two  jammers.  Figure  2  plots  the  BER  of  the 
estimated  disjoint  S-T  MF  and  MVDR  type  configurations 
while  Figure  3  involves  receiver  configurations  that  utilize 
an  auxiliary  vector  filter  in  the  first  stage  or  the  second  stage 
or  both  stages.  Figure  4  plots  the  BER  versus  the  data 
record  size  for  the  estimated  joint-domain  configurations. 
The  SNR  of  Signal  1  in  Figures  2-4  is  fixed  at  15  dB. 

Figures  1-4  illustrate  the  performance  gains  when  non- 
MF-type  signal  processing  is  performed  by  the  GPS  receiver 
and  do  not  consider  combining.  Additional  performance 
gains  obtained  through  EGC  combining  are  illustrated  in 
Figure  5  where  we  plot  the  BER  as  a  function  of  data  record 
size  for  the  best  receiver  configuration  of  previous  (Figs.  2- 
4)  studies. 
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ABSTRACT 

This  paper  applies  subspace  projection  techniques  as  a  pre¬ 
correlation  signal  processing  method  for  the  FM  interfer¬ 
ence  suppressions  in  GPS  receivers.  The  FM  jammers  are 
instantaneous  narrowband  and  have  clear  time-frequency 
(t-f)  signatures  that  are  distinct  from  the  GPS  C/A  spread 
spectrum  code.  In  the  proposed  technique,  the  instantaneous 
frequency  (IF)  of  the  jammer  is  estimated  and  used  to  con¬ 
struct  a  rotated  signal  space  in  which  the  jammer  occupies 
one  dimension.  The  anti-jamming  system  is  implemented 
by  projecting  the  received  sequence  onto  the  jammer-free 
subspace.  This  paper  focuses  on  the  characteristics  of  the 
GPS  C/A  code  and  derives  the  signal  to  interference  and 
noise  ratio  (SINR)  of  the  GPS  receivers  implementing  the 
subspace  projection  techniques. 

1.  INTRODUCTION 

The  Global  Positioning  System  (GPS)  is  a  satellite-based, 
worldwide,  all-weather  navigation  and  timing  system  [1]. 
The  ever-increasing  reliance  on  GPS  for  navigation  and 
guidance  has  created  a  growing  awareness  of  the  need  for 
adequate  protection  against  both  unintentional  and  inten¬ 
tional  interference.  Jamming  is  a  procedure  that  attempts 
to  block  reception  of  the  desired  signal  by  the  intended 
receiver.  In  general  terms,  it  is  high  power  signal  that 
occupies  the  same  frequency  as  the  desired  signal,  making 
reception  by  the  intended  receiver  difficult  or  impossible. 
Designers  of  military  as  well  as  commercial  communication 
systems  have,  through  the  years,  developed  numerous  anti¬ 
jamming  techniques  to  counter  these  threats.  As  these  tech¬ 
niques  become  effective  for  interference  removal  and  miti¬ 
gation,  jammers  themselves  have  become  smarter  and  more 
sophisticated,  and  generate  signals,  which  are  difficult  to 
combat. 

The  GPS  system  employs  BPSK-modulated  direct  se¬ 
quence  spread  spectrum  (DSSS)  signals.  The  DSSS  systems 
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are  implicitly  able  to  provide  a  certain  degree  of  protec¬ 
tion  against  intentional  or  non-intentional  jammers.  How¬ 
ever,  in  many  cases,  the  jammer  may  be  much  stronger 
than  the  GPS  signal,  and  the  spreading  gain  might  be 
insufficient  to  decode  the  useful  data  reliably  [2],  There 
are  several  methods  that  have  been  proposed  for  interfer¬ 
ence  suppression  in  DSSS  communications  [3,  4,  5].  The 
recent  development  of  the  bilinear  time-frequency  distri¬ 
butions  (TFDs)  for  improved  signal  power  localization  in 
the  time-frequency  plane  has  motivated  several  new  effec¬ 
tive  approaches,  based  on  instantaneous  frequency  (IF)  esti¬ 
mation,  for  non-stationary  interference  excisions  [6].  One 
of  the  important  IF-based  interference  rejection  techniques 
uses  the  jammer  IF  to  construct  a  time-varying  excision 
notch  filter  that  effectively  removes  the  interference  [7], 
However,  this  notch  filtering  excision  technique  causes  sig¬ 
nificant  distortions  to  the  desired  signal,  leading  to  unde¬ 
sired  receiver  performance. 

Recently,  subspace  projection  techniques,  which  are  also 
based  on  IF  estimation,  have  been  devised  for  non-stationary 
FM  interference  excision  in  DSSS  communications  [8].  The 
techniques  assume  clear  jammer  time-frequency  signatures 
and  rely  on  the  distinct  differences  in  the  localization  prop¬ 
erties  between  the  jammer  and  the  spread  spectrum  signals. 
The  jammer  instantaneous  frequency,  whether  provided  by 
the  time-frequency  distributions  or  any  other  IF  estimator, 
is  used  to  form  an  interference  subspace.  Projection  can 
then  be  performed  to  excise  the  jammer  from  the  incoming 
signal  prior  to  correlation  with  the  receiver  PN  sequence. 
The  result  is  improved  receiver  SINR  and  reduced  BERs. 

In  this  paper,  we  apply  the  subspace  projection  tech¬ 
niques  as  a  pre-correlation  signal  processing  method  to  the 
FM  interference  suppression  in  GPS  receivers.  The  GPS 
receiver  and  signal  structure  impose  new  constraints  on 
the  problem  since  the  spreading  code  from  each  satellite 
is  known  and  periodic  within  one  navigation  data  symbol. 
This  structure  and  the  signal  model  are  reviewed  in  Section 
2.  In  Section  3,  we  depict  the  received  GPS  signal  prop¬ 
erties  in  time-frequency  domain.  The  SINR  of  the  GPS 
receiver  implementing  the  subspace  projection  techniques 
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is  derived  in  Section  4,  which  shows  improved  performance 
in  strong  interference  environments. 

2.  SIGNAL  MODEL 

GPS  employs  BPSK-modulated  DSSS  signals.  The  naviga¬ 
tion  data  is  transmitted  at  a  symbol  rate  of  50  bps.  It  is 
spread  by  a  coarse  acquisition  (C/A)  code  and  a  precision 
(P)  code.  The  C/A  code  is  a  Gold  sequence  with  a  chip  rate 
of  1.023  MHz  and  a  period  of  1023  chips,  i.e.  its  period  is 
lms,  and  there  are  20  periods  within  one  data  symbol.  The 
P  code  is  a  pseudorandom  code  at  the  rate  of  10.23  MHz 
and  with  a  period  of  1  week.  These  two  spreading  codes 
are  multiplexed  in  quadrature  phases.  Figure  1  shows  the 
signal  structure.  The  carrier  Ll  is  modulated  by  both  C/A 
code  and  P  code,  whereas  the  carrier  L2  is  only  modulated 
by  P  code.  In  this  paper,  we  will  mainly  address  the  prob¬ 
lem  of  anti-jamming  for  the  C/A  code,  for  which  the  peak 
power  spectral  density  exceeds  that  of  the  P  code  by  about 
13  dB  [1],  The  transmitted  GPS  signal  is  also  very  weak 
with  Jammer-to-Signal  Ratio  (JSR)  often  larger  than  40  dB 
and  Signal-to-Noise  Ratio  (SNR)  in  the  range  -14  to  -20  dB 
[2,  9],  Due  to  the  high  JSR,  the  FM  jammer  often  has  a 
clear  signature  in  the  time-frequency  domain  as  shown  in 
Section  3.  As  the  P  code  is  very  weak  compared  to  the  C/A 
code,  noise  and  jammer,  we  can  ignore  its  presence  in  our 
analysis. 
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Figure  1:  The  GPS  signal  structure. 


The  BPSK-modulated  DSSS  signal  may  be  expressed  as 

«(*)  =  h  bi (*  “  lTb )  7>  e  M-  !}Vl  (1) 

* 

where  /,■  represents  the  binary  information  sequence  and  Ti 
is  the  bit  interval,  which  is  20ms  in  the  case  of  GPS  system. 
The  ith  binary  information  bit,  6;(t)  is  further  decomposed 
as  a  superposition  of  L  spreading  codes,  p(n),  pulse  shaped 
by  a  unit-energy  function,  q(t),  of  duration  of  rc,  which  is 
1/1023  ms  in  the  case  of  C/A  code.  Accordingly, 

L 

M0  =  £»  q(t  -  n-rc)  (2) 

n=  1 


The  signal  for  one  data  bit  at  the  receiver,  after  demod¬ 
ulation,  and  sampling  at  chip  rate,  becomes 

x(n)  =  p(n)  +  w(n)  +  j(n)  l<n<L  (3) 

where  p(n)  is  the  chip  sequence,  w(n)  is  the  white  noise, 
and  j(n)  is  the  interfering  signal.  The  above  equation  can 
be  written  in  the  vector  form 


X  : 

=  p+’ 

W+  j 

where 

X  — 

[  *(i) 

x(2) 

*(3)  • ■ 

■  .(L)  ]T 

P  = 

[  P(i) 

P(  2) 

P(3)  •  • 

•  P(L)  f 

w  = 

[  w(l) 

w(2) 

w(3) 

■  • •  w(L)  ] 

j  = 

[  m 

m 

m 

■  m  f 

(4) 


All  vectors  are  of  dimension  Lx  1,  and  ‘T’  denotes  vector  or 
matrix  transposition.  It  should  be  noted  that  the  P  vector 
is  real,  whereas  all  other  vectors  in  the  above  equation  have 
complex  enteries. 


3.  PERIODIC  SIGNAL  PLUS  JAMMER  IN 
THE  TIME-FREQUENCY  DOMAIN 

For  GPS  C/A  code,  the  PN  sequence  is  periodic.  The  PN 
code  of  length  1023  repeats  itself  20  times  within  one  sym¬ 
bol  of  the  50  bps  navigation  data.  Consequently,  it  is  no 
longer  of  a  continuous  spectrum  in  the  frequency  domain, 
but  rather  of  spectral  lines.  The  case  is  the  same  for  peri¬ 
odic  jammers.  Figure  2  and  Figure  3  show  the  effect  of 
periodicity  of  the  signal  and  the  jammer  on  their  respective 
power  distribution  over  time  and  frequency,  using  Wigner- 
Ville  distribution.  In  both  figures,  a  PN  sequence  of  length 
32  samples  that  repeats  8  times  is  used.  A  non-periodic 
chirp  jammer  of  a  50dB  JSR  (jammer-to-signal  ratio)  is 
added  in  Figure  2.  A  periodic  chirp  jammer  of  50  dB  JSR 
with  the  same  period  as  the  C/A  code  is  included  in  Figure 
3.  We  note  that  the  chosen  value  of  50dB  JSR  has  a  practi¬ 
cal  significance.  The  spread  spectrum  systems  in  a  typical 
GPS  C/A  code  receiver  can  tolerate  a  narrowband  inter¬ 
ference  of  approximately  40  dB  JSR  without  interference 
mitigation  processing.  However,  field  tests  show  that  jam¬ 
mer  strength  often  exceeds  that  number  due  to  the  weak¬ 
ness  of  the  signal.  SNR  in  both  figures  are  -20dB,  which 
is  also  close  to  its  practical  value  [2,  9],  Due  to  high  JSR, 
the  jammer  is  dominant  in  both  figures.  From  Figure  3, 
it  is  clear  that  the  periodicity  of  the  jammer  brings  more 
difficulty  to  IF  estimation  than  the  non-periodic  jammers. 
This  problem  can  be  solved  by  applying  a  short  data  win¬ 
dow  when  using  Wigner-Ville  distribution.  Note  that  the 
window  length  should  be  less  than  the  jammer  period.  Fig¬ 
ure  4  shows  the  result  of  applying  a  window  of  length  31 
to  the  same  data  used  in  Fig.  3.  It  is  evident  from  the 
Fig.  4  that  the  horizontal  discrete  harmonic  lines  have  dis¬ 
appeared. 


4.  GPS  ANTI-JAMMING  USING 
PROJECTION  TECHNIQUES 

The  concept  of  subspace  projection  for  instantaneously  nar¬ 
rowband  jammer  suppression  is  to  remove  the  jammer  corn- 
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contour  of  periodic  PN  +  nonperiodic  chirp  jammer 


Figure  2:  Periodic  signal  corrupted  by  a  non-periodic  jam¬ 
mer  in  time-frequency  domain 


contour  of  periodic  PN  +  periodic  chirp  jammer,  with  window 


Figure  4:  Periodic  signal  corrupted  by  a  periodic  jammer 
in  time-frequency  domain  (with  window) 


contour  of  periodic  PN  +  periodic  chirp  jammer 


Figure  3:  Periodic  signal  corrupted  by  a  periodic  jammer 
in  time-frequency  domain 


Figure  5:  Jammer  excision  by  subspace  projection 


ponents  from  the  received  data  by  projecting  it  onto  the 
subspace  that  is  orthogonal  to  the  jammer  subspace,  as 
illustrated  in  Fig.  5. 

Once  the  instantaneous  frequency  (IF)  of  the  non-station- 
ary  jammer  is  estimated  from  the  time-frequency  domain, 
or  by  using  any  other  IF  estimator  [10,  11,  12,  13],  the 
interference  signal  vector  j  in  (4)  can  be  constructed,  up 
to  ambiguity  in  phase  and  possibly  in  amplitude.  In  the 
proposed  interference  excision  approach,  the  data  vector  is 
partitioned  into  Q  blocks,  each  of  length  P,  i.e.  L=PQ.  For 
the  GPS  C/A  code,  Q=20,  P=1023,  and  all  Q  blocks  are 
identical,  i.e.,  the  signal  PN  sequence  is  periodic.  Block¬ 
processing  provides  the  flexibility  to  discard  the  portions 
of  the  data  bit,  over  whih  there  are  significant  errors  in 
the  IF  estimates.  The  orthogonal  projection  method  makes 
use  of  the  fact  that,  in  each  block,  the  jammer  has  a  one¬ 
dimensional  subspace  J  in  the  P-dimensional  space  V,  which 
is  spanned  by  the  received  data  vector.  The  interference  can 


be  removed  from  each  block  by  projecting  the  received  data 
on  the  corresponding  orthogonal  subspace  Q  of  the  interfer¬ 
ence  subspace  J.  The  subspace  J  is  estimated  using  the 
IF  information.  The  projection  matrix  for  the  kth  block  is 
given  by 

Vfc  =  I  —  u*  uf  (5) 

The  vector  u*  is  the  unit  norm  basis  vector  in  the  direction 
of  the  interference  vector  of  the  kth  block,  and  ‘H’  denotes 
vector  or  matrix  Hermitian.  Since  the  FM  jammer  signals 
are  uniquely  characterized  by  their  IFs,  the  ith  FM  jammer 
in  the  kth  block  can  be  expressed  as 

Wfc(i)  =  -j=exp[j4>k{i)]  (6) 

The  result  of  the  projection  over  the  ktf>  data  block  is 

x*  =  V*  xfc  (7) 
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where  x*  is  the  input  data  vector.  Using  the  three  differ¬ 
ent  components  that  make  up  the  input  vector  in  (4),  the 
output  of  the  projection  filter  V*  can  be  written  as 

x*  =  Vfc  [pfc  +  wt+j*]  (8) 

The  noise  is  assumed  to  be  complex  white  Gaussian  with 
zero-mean, 

£[w(n)]  =  0,  i?[w(r*)*m(n  +  l)]  =  (72<5(/),V/  (9) 

Since  we  assume  total  interference  excision  through  the  pro¬ 
jection  operation,  then 

V*j*  =  0,  xfc  =  V*  p*  +  V*  Wit  (10) 

The  decision  variable  yr  is  the  real  part  of  y  that  is  obtained 
by  correlating  the  filter  output  x*  with  the  corresponding 
kth  block  of  the  receiver  PN  sequence  and  summing  the 
results  over  the  K  blocks.  That  is, 

K- 1 

y  =  ^2^pk  (ii) 

Jt=0 

Since  the  PN  code  is  periodic,  we  can  strip  off  the  subscript 
k  in  pk.  The  above  variable  can  be  written  in  terms  of  the 
constituent  signals  as 

Q-l  Q-l 

y=^PTVfcp+^wHVfcp  Ai/1-f  y2  (12) 

k=0  *= 0 

where  t/x  and  y2  are  the  contributions  of  the  PN  and  noise 
sequences  to  the  decision  variable,  respectively.  In  [8],  y\ 
is  considered  as  a  random  variable.  However,  in  GPS  sys¬ 
tem,  due  to  the  fact  that  each  satellite  is  assigned  a  fixed 
Gold  code  [1],  and  that  the  Gold  code  is  the  same  for  every 
navigation  data  symbol,  yi  can  no  longer  be  treated  as  a 
random  variable,  but  rather  a  deterministic  value.  This  is 
a  key  difference  between  the  GPS  system  and  other  spread 
spectrum  systems.  The  value  of  yi  is  given  by 

yi  =  pTVfcP 

fc=0 

=  ][>r(I-u*u?)p 

k= 0 

=  X!(pTp  ~  pTu*UkV) 


fc= 0 

<3-1 


=  QP-  ^(pTu*ufp) 

(13) 

k= 0 

Define 

a 

II 

JC 

(14) 

as  the  correlation  coefficient  between  the  PN  sequence  vec¬ 
tor  p  and  the  jammer  vector  u.  (3k  reflects  the  the  com¬ 
ponent  of  the  signal  that  is  in  the  jammer  subspace,  and 
represents  the  degree  of  resemblance  between  the  signal 


sequence  and  the  jammer  sequence.  Since  the  signal  is  a 
PN  sequence,  and  the  jammer  is  a  non-stationary  FM  sig¬ 
nal,  the  correlation  coefficient  is  typically  very  small.  With 
the  above  definition,  yi  can  be  expressed  as 

<3-1 

Vi  =  P(Q-£  W2)  (15) 

k=o 

From  (15),  it  is  clear  that  yi  is  a  real  value,  which  is  the 
result  of  the  fact  that  the  projection  matrix  V  is  Hermitian. 
With  the  assumptions  in  (9),  y2  is  complex  white  Gaussian 
with  zero-mean.  Therefore, 


<4  =  e  [|y2|2] 

'  Q-l  Q-l 

=  E  (J>HV*P)"(]TwHV,p) 

.  k= 0  1=0 

=  pTVkE  [W*W1H]  V<P 

k= 0  1=0 

Q-l 

=  y]  pTVkE  [wfcwf  ]  Vkp 

k= 0 

=  <r2]TPTV*V*p 

k= 0 
Q-l 

=  <r2  ]CpTVfcP  =  ff2Vi  (16) 

fc=0 

the  above  equations  make  use  of  the  noise  assumptions  in 
(9)  and  the  properties  of  the  projection  matrix.  The  deci¬ 
sion  variable  yr  is  the  real  part  of  y.  Consequently,  yr  is 
given  by 

yr  =  yi  +  i?e{y2)  (17) 

where  Re{y2}  denotes  the  real  part  of  y2.  Re{y2}  is  real 
white  Gaussian  with  zero-mean  and  variance  §<Ty3.  There¬ 
fore,  the  S1NR  is 


SINR 


var{Re{y2 }} 

V2  _ 

ff2 

2P(Q  -  EL~n  \Ml 

tT2 


(18) 


In  the  absence  of  jammers,  no  excision  is  necessary,  and  the 
SINR(SNR)  of  the  receiver  output  will  become  2 PQ/o3, 
which  represents  the  upper  bound  for  the  anti-jamming  per- 
2p  V  'Q—  1  p 

formance.  Clearly,  — — - —  is  the  reduction  in  the 
receiver  performance  caused  by  the  proposed  jammer  sup¬ 
pression  techniques.  It  reflects  the  energy  of  the  power  of 
the  signal  component  that  is  in  the  jammer  subspace.  If 
the  jammer  and  spread  spectrum  signals  are  orthogonal, 
i.e.,  their  correlation  coefficient  \(i\  =  0,  then  interference 
suppression  is  achieved  with  no  loss  in  performance.  How¬ 
ever,  as  stated  above,  in  the  general  case,  fik  is  often  very 
small,  so  the  projection  technique  can  excise  FM  jammers 
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Figure  6:  Receiver  SINR  vs  SNR. 


effectively  with  only  very  insignificant  signal  loss.  The  lower 
bound  of  SINR  is  zero  and  corresponds  to  |/3|  =  1.  This  case 
requires  the  jammer  to  assume  the  C/A  code,  i.e.,  identi¬ 
cal  and  synchronous  with  actual  one.  Figure  6  depicts  the 
theoretical  SINR  in  (18),  its  upper  bound,  and  estimated 
values  using  computer  simulation.  The  SNR  assumes  five 
different  values  [-25,  -20,  -15,  -10,-5]  dB.  In  this  figure,  the 
signal  is  the  Gold  code  of  satellite  SV#1,  and  the  jammer 
is  a  periodic  chirp  FM  signal  with  frequency  0-0.5  and  has 
the  same  period  as  the  C/A  code.  For  this  case,  the  cor¬ 
relation  coefFiciant  /3  is  very  small,  \fi\  —  0.0387.  JSR  used 
in  the  computer  simulation  is  set  to  50dB.  Due  to  the  large 
computation  involved,  we  have  used  1000  realizations  for 
each  SNR  value.  Figure  6  demonstrates  that  the  theoreti¬ 
cal  value  of  SINRs  is  almost  the  same  as  the  upper  bound 
and  both  are  very  close  to  the  simulation  result.  In  the 
simulation  as  well  as  in  the  derivation  of  equation  (18),  we 
have  assumed  exact  knowledge  of  the  jammer  IF.  Inaccura¬ 
cies  in  the  IF  estimation  will  have  an  effect  on  the  receiver 
performance  [8], 

5.  CONCLUSIONS 

GPS  receivers  are  vulnerable  to  strong  interferences.  In 
this  paper,  subspace  projection  techniques  are  adapted  for 
the  anti-FM  jamming  GPS  receiver.  These  techniques  are 
based  on  IF  estimation  of  the  jammer  signal,  which  can  be 
easily  achieved,  providing  that  the  C/A  code  and  the  jam¬ 
mer  have  distinct  time-frequency  signatures.  The  IF  infor¬ 
mation  is  used  to  construct  the  FM  interference  subspace 
which,  because  of  signal  nonstationarities,  is  otherwise  diffi¬ 
cult  to  obtain.  Due  to  the  characteristic  of  the  GPS  spread 
spectrum  signal  structure  and  the  fact  that  the  C/A  codes 
are  fixed  for  the  different  satellites  and  known  to  all,  the 
analysis  of  the  receiver  SINR  becomes  different  from  com¬ 
mon  spread  spectrum  systems.  The  theoretical  and  simula¬ 
tion  results  suggest  that  the  subspace  projection  techniques 
can  effectively  excise  FM  jammers  for  GPS  receivers  with 
insignificant  loss  in  the  spreading  gain. 
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ABSTRACT 

Telephone  line  echo  path  impulse  responses  have  been 
estimated  using  a  Haar-wavelet-based  adaptive  filter. 
Estimation  error  of  two  percent  has  been  tolerated. 
This  has  allowed  structural  simplification  to  the  Haar- 
wavelet-based  adaptive  filter.  Adaptive  filter  coeffi¬ 
cients  are  then  updated  by  using  a  wavelet-based  LMS 
algorithm,  modified  for  fixed-point  arithmetic.  The 
fixed-point  wavelet  echo  canceller  has  been  tested  for 
white  noise  and  colored  noise  input  signals  and  has 
been  compared  with  the  fixed-point  FIR  echo  canceller 
through  simulations.  The  wavelet  based  canceller  con¬ 
verges  much  faster  than  its  FIR  counterpart.1 

1.  INTRODUCTION 

Characteristics  of  echo  paths  are  time  varying.  The 
corresponding  impulse  responses  may  have  long  tails 
and  change  frequently  in  the  beginning.  The  length  of 
FIR  adaptive  filter  depends  upon  the  tail  of  echo  path 
impulse  response  and  can  therefore  be  large.  On  the 
other  hand,  if  the  same  estimate  of  the  echo  path  im¬ 
pulse  response  can  be  given,  but  with  a  lesser  number  of 
coefficients,  the  computational  complexity  is  reduced. 
In  a  previous  work  [1],  Haar  wavelets  have  been  used 
to  estimate  the  echo  path  impulse  response  with  lesser 
number  of  coefficients  compared  to  FIR  adaptive  filter. 

Wavelets  are  scaled  and  translated  copies  of  a  par¬ 
ticular  window  function  called  the  mother  wavelet  [2]. 
The  wavelet  transform,  like  the  Fourier  transform,  can 
be  used  to  decompose  a  function  in  terms  of  a  set  of  ba¬ 
sis  function.  The  set  of  scaled  and  translated  copies  of 
mother  wavelet,  together  with  a  set  of  functions  known 
as  the  scaling  functions  form  the  basis  in  wavelet  trans¬ 
form.  Thus  an  unknown  discrete-time  system  can  be 
represented  as 

h{t)  —  'y  ~  OmnV’mraU)  (1) 

(myn)€D 
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where  ipmn(t)  belongs  to  a  set  D  of  discrete-time  Haar 
wavelets  and  amn  are  representation  coefficients. 

Haar  wavelets  are  the  simplest  of  wavelet  functions. 
They  are  discrete-time  orthonormal  sequences  ipmn(t) 
defined  by 

V’mn(t)  =  1pmo(t  ~  2 mn)  (2) 

where 

{2~m/ 2  for  0  <  t  <  2m~1  -  1; 

for  2m- 1  _  I  <  t  <  2m  -  1; 

0  otherwise. 

(3) 

The  indices  m  =  1,2,...  and  n  —  ...,-1,0,1,...  cor¬ 
respond  to  the  scale  and  translation  respectively.  Fil¬ 
tering  of  a  signal  by  Haar  wavelets  and  Haar  scaling 
functions  can  be  calculated  as  [3] 


Xm(t)  = 

Xm-i  +  Xm_i(t  -  2™~1) 

(4) 

V2 

Ym(t)  = 

Xm-l  —  Xm-i{t  —  2m_1) 

(5) 

V2 

where  X0(t)  is  the  input  signal  and  Xm(t)  and  Ym(t) 
are  the  outputs  of  the  scaling  and  wavelet  filter  at  level 
m.  The  index  m  identifies  the  level. 

Practically,  Haar  wavelet  filtering  is  implemented 
by  using  two  filters.  The  average  component  (or  scal¬ 
ing  filtering)  of  the  input  signal  is  obtained  by  pass¬ 
ing  Am_i (t)  through  a  low-pass  filter.  The  difference 
component  (or  wavelet  filtering)  is  obtained  by  pass¬ 
ing  Am_i  (t)  through  a  high-pass  filter.  The  multires¬ 
olution  analysis  can  then  be  obtained  by  running  the 
Haar  filtering  again  on  the  average  component  that  was 
obtained  from  (4).  This  implies  that  another  set  of 
two  filters  is  required.  The  two-filter  bank  can  be  cas¬ 
caded  until  the  desired  level  of  wavelet  decomposition 
is  reached. 

2.  SIZE  OF  WAVELET  FILTER  BANK 

The  size  of  the  wavelet  filter  bank  depends  on  what 
type  of  echo  path  impulse  response  it  has  to  estimate. 
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Figure  1:  Energy  corresponding  to  the  number  of  coef¬ 
ficients 

Once  the  maximum  length  of  echo  path  impulse  re¬ 
sponse,  Lmax,  is  known 

L  =  log2 {Lmax)  ~  1 

can  be  used  to  give  the  number  of  levels  required  in  the 
wavelet  filter  bank.  In  order  to  simplify  the  complexity 
of  wavelet  adaptive  filter  the  set  D  instead  of  the  set  D 
is  used  in  (1).  One  criterion  to  select  D  can  be  to  find 
the  relationship  between  the  energy  of  the  hybrid  and 
the  wavelet  and  the  scaling  coefficients  that  are  used 
to  represent  the  hybrid.  An  algorithm  is  generated  to 
calculate  this  relationship  [3].  Twenty-one  hybrids  are 
then  presented  to  this  algorithm.  The  algorithm  takes 
the  Haar-wavelet  transform  of  each  hybrid,  sorts  the 
wavelet  and  scaling  coefficients  and  gives  the  strongest 
coefficients  that  represent  the  required  fraction  of  the 
total  energy.  These  coefficients  are  referred  as  signifi¬ 
cant  coefficients.  The  same  operation  is  applied  to  all 
hybrids.  The  union  of  these  significant  coefficients  is 
taken  and  used  in  the  wavelet  adaptive  filter.  The  rela¬ 
tionship  between  the  energy  and  the  number  of  wavelet 
coefficients  is  shown  in  Figure  1.  In  order  to  simplify 
the  computational  complexity,  modeling  error  of  two 
percent  has  been  tolerated.  The  number  of  coefficients 
required  to  give  estimation  error  of  at  most  two  per¬ 
cent,  for  every  considered  hybrid  is  65.  It  has  to  be 
noted  that  these  coefficients  are  not  the  first  65  coeffi¬ 
cients  but  are  distributed  along  scales  and  shifts. 

3.  HAAR  WAVELET  ADAPTIVE  FILTER 

Figure  2  shows  complete  wavelet  filter  bank  that  is  used 
for  adaptive  estimation  of  the  impulse  response  of  a 
hybrid  given  in  (1).  This  filter  bank  has  a  tree  struc¬ 
ture  with  each  leaf  of  tree  corresponding  to  a  level  of 


Haar  wavelet  filtering.  For  instance,  the  filter  H(w), 
does  the  wavelet  filtering  at  level  1.  Similarly,  the  filter 
H(2o;),  does  the  wavelet  filtering  at  level  2.  Figure  2 
also  shows  delay  elements  at  each  level.  The  signal  at 
the  output  of  each  delay  element  is  referred  as  the  shift 
of  the  signal  at  that  level.  Output  at  each  delay  ele¬ 
ment  has  an  adaptive  filter  coefficient  associated  with 
it.  The  wavelet  filtered  signal  present  at  the  output  of 
the  high  pass  filter  at  a  level  and  its  shifted  versions  are 
then  multiplied  by  their  respective  adaptive  filter  co¬ 
efficients.  The  multiplication  results  are  then  summed 
up  to  give  the  output  at  that  level.  The  output  of  the 
adaptive  filter  is  then  obtained  by  adding  the  outputs 
at  each  level.  It  has  to  be  noted  that  the  last  level  has 
two  outputs  -  one  coming  from  its  wavelet  filter  and 
the  other  from  its  scaling  filter. 

3.1.  Floating  Point  Wavelet  LMS 

The  floating  point  algorithm  to  adapt  the  filter  coeffi¬ 
cients  is  given  as  [4] 

&mn  (t  + 1)  —  (£)  +  lirnnR  mn  (t)e(t)  (6) 

Here  fimn  is  the  step  size,  e(i)  the  error  between  the  de¬ 
sired  signal  and  the  output  of  the  adaptive  filter  given 
as  _ 

z(t)  7  Rmn(t)amn  (t)  (7) 

(m,n)€D 

where  D  C  D.  Using  D  instead  of  D  gives  reduced 
order  modeling. 

3.2.  Fixed  Point  Wavelet  LMS 

The  algorithm  given  in  (6)  can  be  modified  for  fixed 
point  implementations.  First,  it  has  to  be  noted  that 
the  calculation  of  step  size  in  (6)  for  each  wavelet  coeffi¬ 
cient  is  not  necessary  as  Rmn(t )  is  the  same  as  Rmo{t)  = 
Ym(t)  present  at  the  output  of  high  pass  filter  at  level 
m,  except  for  a  delay.  Therefore,  the  algorithm  will 
function  properly  if  the  step  size  is  calculated  only  once 
for  each  level  as 

Mm0  =  CPm(t)  (8) 

where  c  and  C  are  some  constants  and  Pm{t)  is  the 
exponential-window  time  averaged  power  of  a  wavelet 
filter  output.  Exponential-window  averaging  of  power 
of  the  wavelet  filter  output  Ym(t)  is  given  as 

Pm(t)  =  (1  -  a)Pm(t  -  1)  +  aY*(t) 

=  Pm(t-l)+a[Yg(t)-Pm(t-l)}  w 

where  a  =  2~fc  and  k  E  {0, 1, 2...,  30}. 
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Lowpass  Filter  Highpass  Filter 


cvel  1  Level  2  Level  3  Level  4  Level  5  Level  6  Level  7  Level  7 

HP  HP  HP  HP  HP  HP  HP  LP 

Figure  2:  Wavelet  Filter  Bank 


Division  by  square  root  of  two  is  avoided  in  (4)  and 
(5)  by  applying  filtering  as 

Xm(t)  =  W  +  Xm-lit-Z™-1)  (1Q) 


The  division  operation  in  the  step  size  calculation  can 
be  approximated  by 


C2mpm(t) 


2~wm(t)-0-r, 


ym(t)  =  Xm  1^t — - - -  (11)  where  wm(t)  =  [log2 pm{t) }  and  0  is  some  natural  num- 

2  ber. 

The  relationship  with  the  previous  Haar  wavelet  and  An  upper  bound  on  adaptive  filter  coefficients  given 

scaling  filter  outputs  is  given  as  in  (6)  is 


Xm{t)  =  2m/2xm(f)  (12) 

Ym(t)  =  2  m'2ym(t).  (13) 

The  output  of  modified  filters,  rm(t),  is  related  to  Rm(t ) 
as 

Rm{t)=2m'2rm{t)  (14) 

and  the  exponential-window  time  averaged  power  of 
y-rnii) i  Pm(t)i  to  Pm{t)  as 

Pm(t)  =  (2  m/2fPm{t)  (15) 

Substituting  (14)  in  (6)  and  multiplying  by  2m/2  gives 

dmn(£  ~f~  1)  ~  flmn  (t)  +  pmo(t)2mrmn(t)e(t).  (16) 

The  step  size  can  now  be  calculated  as 


lamn|  —  |h  Wm„|  —  [  h(k)Wmn(k)\ 

<  TE&lM]  (19) 

=  VlEkh*(k)}<l 

where  h  is  a  vector  containing  hybrid’s  impulse  re¬ 
sponse  and  Wmn  is  a  vector  containing  an  orthonormal 
wavelet.  In  order  to  represent  adaptive  filter  coeffi¬ 
cients  using  the  full  sixteen  bit  binary  number  range, 
(16)  needs  to  be  multiplied  by  215_lm/2l.  This  results 
in 

2c 

amn(t  +  1)  =  dmn(t)  +  — — — ■215_rm/2f  rmn{t)e{t) 

^ Pm 

(20) 

where  dm(t)  =  215  ^m/2^a(t).  The  fixed-point  wavelet 
LMS  algorithm  can  now  be  written  as 


C2mPm(t) ' 


'dmnit  +  1)  =  dmn(t)  +  2~^Wm^+l3-l^rmn{t)e{t)2-m/2 . 

(21) 
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Figure  3:  Learning  curve  with  various  precautions  for 
wavelet  LMS 


Precaution 

(3 

Steady  State 
Error(dB) 

Convergence 

Rate(dB/sec) 

5 

-26.88 

-270 

6 

-31.4471 

-394 

7 

-32.69 

-284 

8 

-33.1980 

-160 

9 

-33.427 

-83 

10 

-33.5345 

-43 

11 

-33.59 

-22 

12 

-33.6(est) 

-11 

13 

-33.6(est) 

-5 

14 

-33.6(est) 

-2 

Table  1:  Steady  state  error  and  rate  of  convergence  for 
wavelet  LMS 


4.  PERFORMANCE  OF  FIXED-POINT 
WAVELET  LMS  ALGORITHM 

The  fixed-point  wavelet  LMS  depends  on  three  param¬ 
eters:  a,  which  governs  the  size  of  window  for  calcula¬ 
tion  of  power;  (3,  the  precaution  in  step  size;  and  Pm(0) 
the  initial  value  of  power  in  (9).  Through  simulations  it 
was  noted  that  changing  the  parameters,  a  and  Pm(0), 
have  no  major  effect  on  the  performance.  Parameter  /? 
is  a  precaution  factor,  as  it  is  inversely  related  to  the 
step  size.  Therefore,  it  was  observed  that  increasing 
the  value  of  (3  reduces  the  speed  of  convergence. 


Figure  4:  Learning  curves  for  different  precautions  for 
FIR  LMS 


Precaution 

(3 

Steady  State 
Error(dB) 

Convergence 

Rate(dB/sec) 

0 

-28.86 

-92 

1 

-32.9 

-118 

2 

-34 

-75 

3 

-34.5 

-38 

4 

-34.7 

-18 

5 

-35(est) 

-9 

6 

-35  (est) 

-5 

Table  2:  Steady  state  error  and  rate  of  convergence  for 
FIR  LMS 


of  hybrid.  The  near-end  signal  is  introduced  because  of 
two  reasons:  (a)  to  provide  possibility  to  match  steady 
state  errors  for  wavelet  and  FIR  adaptive  algorithms, 
and  (b)  to  model  more  accurately  the  real  life  working 
environment.  The  plots  are  obtained  after  10  Monte 
Carlo  runs.  Tab.l  and  Tab.2  show  the  steady  state 
error  and  rate  of  convergence  for  different  precautions 
for  wavelet  LMS  and  FIR  LMS,  respectively.  Using 
the  values  from  these  tables,  /3=1  for  FIR  and  (3=7  for 
wavelet  LMS  can  be  used  to  match  the  steady  state 
error.  Similarly,  /3=3  for  FIR  and  [3=10  for  wavelet 
LMS  can  be  used  to  match  the  convergence  rate.  Some 
steady  state  errors  in  the  tables  are  predicted  since  it 
takes  long  time  to  reach  the  convergence.  This  is  de¬ 
noted  in  the  tables  by  est. 


4.1.  White  Noise 

Fixed-point  wavelet  LMS  can  be  compared  with  fixed- 
point  FIR  LMS,  using  Figure  3  and  Figure  4,  for  white 
noise  far-end  signal  with  power  108  and  near-end  noise 
power  level  equal  to  -35dB  with  respect  to  the  output 


4.2.  Colored  Noise 

Three  kinds  of  colored  noise  are  considered  :  low  pass, 
high  pass  and  band  pass,  as  the  far-end  signal.  The 
near-end  noise  is  white  and  has  a  power  level  of  -35  dB 
with  respect  to  the  output  of  hybrid.  Butterworth  fil- 
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Figure  5:  Low-pass  colored  noise  with  /?  =  3  for  FIR 
and  ft  =  7  for  wavelet 


ters  of  order  18  with  three  different  pass  bands  are  used 
to  generate  the  colored  noise.  For  band  pass  noise,  the 
filter  has  a  pass  band  ranging  from  1200  to  2800  Hz. 
For  low  pass  noise,  the  filter  has  a  cutoff  frequency  of 
800  Hz.  For  high  pass  noise,  the  filter  passes  all  the 
frequencies  higher  than  2800  Hz.  The  plots  shown  in 
Figure  5,  Figure  6  and  Figure  7  are  for  the  case  when 
the  steady  state  error  is  matched  directly  for  both  al¬ 
gorithms.  It  can  be  seen  that  wavelet  algorithm  con¬ 
verges  noticeably  faster.  Similar  conclusions  are  ob¬ 
tained  when  the  far-end  input  is  voice,  sinusoidal,  and 
composite  source  signal  [5],  as  it  is  documented  in  [3]. 


Figure  6:  Bandpass  colored  noise  with  ft  —  3  for  FIR 
and  ft  —  7  for  wavelet 


Figure  7:  High-pass  colored  noise  with  ft  =  3  for  FIR 
and  ft  =  7  for  wavelet 
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ABSTRACT 

We  present  a  stringent  definition  of  higher-order  evo¬ 
lutionary  spectra.  On  this  basis,  we  define  wavelet- 
polyspectral  densities  as  a  way  of  dealing  with  non- 
stationarities  in  higher-order  statistics.  We  propose 
a  simple  wavelet-polyspectral  estimator,  and  we  dis¬ 
cuss  its  statistical  properties.  The  proposed  wavelet- 
polyspectral  analysis  tool  is  demonstrated  by  a  numer¬ 
ical  example.  It  is  concluded  that  the  wavelet-poly¬ 
spectra  have  desirable  properties  for  the  analysis  of 
data  that  are  simultaneously  non-stationary  and  non- 
Gaussian/non-linear . 

1.  INTRODUCTION 

Non-stationary  phenomena  have  traditionally  been  stud¬ 
ied  by  (naive)  spectrogram  techniques  [1,  2].  These 
methods  suffer  from  poor  statistical  behavior  and  poor 
resolution  in  time  and/or  frequency.  Various  non-linear 
time-frequency  methods  (e.g.  Cohen’s  class  [1]  and  the 
Wigner-Ville  distribution  [2])  have  been  suggested  to 
mend  some  of  the  mentioned  weaknesses,  but  the  use 
of  these  is  again  non-trivial  and  may  be  hard  to  inter¬ 
pret  in  practice.  The  wavelet  transform  [2]  is  a  linear 
transform  method  that  has  rapidly  become  a  popular 
technique  to  quantify  scale-time  variations  of  time  se¬ 
ries. 

For  almost  all  known  polyspectral  estimators,  it  is 
a  requirement  that  the  data  are  stationary.  This  is 
among  other  things  due  to  the  fact  that  most  defini¬ 
tions  of  polyspectral  estimators  contain  Fourier-trans- 
forms  of  the  data,  which  provide  no  time  information  in 
the  transformed  domain.  Note  however  that  polyspec¬ 
tral  analysis  has  been  combined  with  the  Wigner-Ville 
formalism  [3,  4]  to  cope  with  non-stationarities. 

Recently,  it  has  been  suggested  [5,  6]  to  combine 
the  normalized  third-order  polyspectrum  (the  bicoher¬ 
ence)  with  the  wavelet  transform,  forming  the  wavelet- 
bicoherence. 


The  main  purpose  of  this  paper  is  to  introduce  a 
precise  definition  of  wavelet-polyspectra  of  general  or¬ 
ders,  to  propose  estimators  and  discuss  the  statistical 
properties  of  these.  In  addition,  we  demonstrate  the 
proposed  technique  by  a  relevant  numerical  example. 

2.  HIGHER-ORDER  SPECTRA  OF 
NON-STATIONARY  PROCESSES 

In  [7]  the  class  of  oscillatory  processes,  which  admits  a 
representation  of  the  form 

/OO 

At{f)  exp  (j2nft)  dZ(f)  (1) 

-OO 

with  At(f)  =  Al(-f),  is  examined.  Here  Z(f)  is  an 
orthogonal  process  with  £  j|dZ(/)|2}  =  dp-zif  )-  The 
measure  p2(/)  is  then  an  analog  of  the  integrated  spec¬ 
trum  in  the  stationary  case.  For  such  processes  we  de¬ 
fine  the  nth-order  evolutionary  spectrum  C„(fi , . . .  ,  fn- 1 
with  respect  to  a  family  T  —  {At(f)  exp  (j2nft)}  of  os¬ 
cillatory  functions  by 

dCf  (fi , .  ..,/„)=  At(fl)  ■  •  •  Mfn-l)Mfn) 

■  Cum  [dZ(fi),  •  •  •  ,  dZ{fn )] 
=At{fl)  •  •  ■  At(fn-l)At(fn ) 

’  d(fi  +  .  .  .  +  fn)dp.n{fli  •  •  •  >  fn— l) 

=At(/i)  •  •  •  At(fn-i)AKf) 

■  >  •  •  •  i  fn—l)dfl  ■  ■  •  dfn—i 

—Cn  (/l ,  •  •  •  ,  fn- 1  ,t)dfl-‘-  dfn- 1 

(2) 

provided  that  the  zero-mean  increment  process  dZ(f) 
is  at  least  stationary  to  order  n,  and  that  the  measure 
dpnifi,  ■  •  •  )  fn- 1)  is  absolutely  continuous  with  respect 

to  the  Lebesgue  measure.  Here  /  =  /i  H - 1-  fn- 1  and 

Cum  [•]  denotes  the  cumulant  sequence.  Note  that  the 
evolutionary  power  spectrum  defined  in  [7]  is  a  special 
case  of  the  above  definition  with  n  =  2. 
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3.  WAVELET-POLYSPECTRA 

3.1.  Definitions 

The  continuous  wavelet  transform  (CWT)  of  a  real  val¬ 
ued  signal  x(t)  is  defined  as  [2] 

^(i'a)=^£s(!,)r  (¥)<"'  <3> 

Based  on  the  CWT  we  propose  to  define  the  nth-order 
wavelet-polyspectrum  with  respect  to  a  wavelet  of  the 
form  ip(t)  =  g(t)  exp (jrjt)  as 

(fli  j  •  •  •  >  an— i ;  to)  = 

1  (  ft°+T/‘2  I 

fE\Jt  _T/2 

(4) 

where  the  process  X  ( t )  is  assumed  to  be  stationary  on 
the  interval  of  integration  [f0  -  T/2,  t0  +  T/2]  and 

0-1  =ai-1  +a21  +  •••  +  <*!■  (5) 

The  reason  for  this  inverse  sum  rule  will  become  clear 
in  the  following  section. 

3.2.  Properties 

In  this  paper,  we  will  constrain  the  wavelets  to  the  class 
defined  by 

V’W  =  9(t)exp{jr)t),  (6) 

where  g{t)  is  a  real  valued  and  symmetric  window  which 
has  to  be  well  localized  in  the  frequency  domain.  The 
parameter  77  is  chosen  such  that  the  Fourier  transform 
of  xp(t)  is  essentially  zero  for  f  <  0.  With  this  choice 
of  wavelet  there  is  a  well  defined  relationship  between 
frequency  /  and  scale  a.  For  a  piecewise  stationary 
process  this  relationship  is  given  by  a  =  rj/(2nf)  for 
f  ^  0.  The  CWT  is  therefore  not  defined  for  frequency 

/  =  o. 

The  inverse  relationship  between  /  and  a  is  the  rea¬ 
son  why  we  apply  the  inverse  sum  rule  (5)  in  the  defini¬ 
tion  of  the  wavelet-polyspectra  instead  of  the  sum  rule 
used  in  ordinary  polyspectra  [8].  Using  this  relation 
and  considering  the  limit  of  (4)  as  T  ->  oo  it  can  be 
shown  [9]  that  the  nth-order  wavelet-polyspectrum  of  a 
real  valued  stationary  process  is  related  to  the  ordinary 
nth-order  moment  spectrum  Mn  by 

(®1 ,...  ,  On- 1 )  —  Gn  *  Mn(/i,  .  .  .  ,  /n-l).  (7) 

Here  *  denotes  (n  —  l)-dimensional  convolution  and 
Gn  is  a  spectral  window  which  is  different  for  each  fre¬ 
quency  tuple  (/i,...  ,/n-i).  This  window  acts  as  a 


constant-Q  smoothing  filter  in  the  frequency  domain, 
and  its  exact  form  depends  on  the  choice  of  window 
g(t)  in  the  wavelet  (6). 

4.  WAVELET-POLYSPECTRAL 
ESTIMATION 

4.1.  Estimators 

A  straightforward  method  of  estimating  the  nth-order 
wavelet  spectrum  is  to  discretize  the  time  in  equation 
(4)  and  remove  the  expectation  operator.  Then  if  t''= 
nAt  and  t  -  kAt]  n,k  =  0, 1, . . .  ,  N  -  1,  a  time- 
discretized  version  of  the  wavelet  transform  in  equation 
(3)  becomes  a  suitable  estimator  for  the  CWT.  Assum¬ 
ing  At  =  1,  this  becomes 

=  (8) 

V  '  n= 0  ' 

where  we  have  used  a  =  77/(277/).  If  we  let  T  =  2  K,  an 
estimator  for  the  nth-order  wavelet  spectrum  at  time 
to  =  L  is 

AWi,...,/n-i;L)  = 

j  L+K  _ _ 

2K  +  1  E 

k—L—K 

(9) 

where  f  =  fi  +  fo  +  ...  +  fn-i.  We  require  that  the 
process  is  stationary  in  the  time  interval  [L- K,  L  +  K). 

4.2.  Statistical  properties 

The  statistical  properties  of  the  estimator  in  equation 
(9)  are  quite  complicated.  For  simplicity  we  will  only 
discuss  some  results  for  n  =  2  and  n  =  3  assuming  a 
zero  mean  process. 

The  expected  value  of  the  estimated  second-order 
wavelet  spectrum  of  a  zero  mean  stationary  process 
X(t)  can  be  written  as  [9] 

E  {M?U)}  =  V2(f  -  f)S(f')  df.  (10) 

Here  S(f)  is  the  true  power  spectrum  and  the  spectral 
window  V2 (/';/)  is  given  by 

=  (11) 

v  k=0 

where  G(f')k,  f)  is  the  discrete  Fourier  transform  of 
a  scaled  and  shifted  version  of  the  window  g(t.)  in  the 
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wavelet  (6)  given  by 


9k,f(n) 


'27 rf 
.  V 


(n  -  fc) 


(12) 


Notice  that  this  window  is  frequency  dependent.  For 
a  fixed  frequency  /,  equation  (10)  is  a  convolution  be¬ 
tween  the  spectral  window  V2{. f'\ f)  and  the  true  power 
spectrum.  With  a  proper  (frequency  dependent)  nor¬ 
malization,  one  can  show  that  M^if)  is  an  asymptot¬ 
ically  unbiased  estimate  of  the  true  power  spectrum 
S(f).  Similarly,  in  the  non-stationary  case,  a  normal¬ 
ization  yields  an  asymptotically  unbiased  estimate  of 
the  evolutionary  power  spectrum  C2(f,t)  defined  in 
equation  (2). 

The  support  of  the  window  gk,f[n]  is  shorter  than 
N,  the  number  of  samples  (except  at  low  frequencies). 
Thus  the  estimator  Mf  (/)  is  equivalent  to  the  Weighted 
Overlapped  Segment  Averaging  (WOSA)  spectral  esti¬ 
mation  method  for  a  fixed  frequency  f,  since  we  can 
write 

N-l  2 

E  x(n)gkJ(n)  exp  ( -j2nfn )  . 

n=Q 


N-l 


fc=0 


The  WOSA  spectral  estimator  is  well  known  for  be¬ 
ing  consistent  [10].  As  mentioned,  the  estimator  in  the 
equation  above  may  be  normalized  to  yield  an  unbiased 
power  spectral  estimator.  The  variance  of  this  unbi¬ 
ased  estimator  for  stationary  Gaussian  signals  may  be 
approximated  by 


where  S(f)  is  the  true  power  spectrum  and 


2irf 


2irf 


(t  +  r) 


dt 


is  the  correlation  between  overlapping  windows.  The 
variance  of  the  periodogram  Sp(f)  under  the  same  as¬ 
sumptions  can  be  written  as  Var  jsp(/)  j  ~  S2(f  )  (see 
e.g.  [10]).  We  thus  see  that  the  variance  is  reduced  by 
a  factor  N/vnU)i  where 


N-l 


(1 

52  (t;/)  2 

V  n) 

52(0;/) 

relative  to  a  (possibly  tapered)  periodogram.  This  fre¬ 
quency  dependent  factor  increases  with  frequency,  since 


the  width  of  the  correlation  function  g2  (t;/)  between 
two  overlapping  windows  decreases  with  frequency  due 
to  the  constant-Q  property  of  the  CWT. 

In  the  non-stationary  case  one  can  show  that  the 
expected  value  of  M^(/;t)  at  t  =  L  is  approximately 
given  by  [9] 

£{M2“(/;L)}  a  J  ^V2(f-f’;f)C2(f',L)df\ 


where  the  total  spectral  window  V2 (/';  f)  is  given  by 
1  l+k 

F2 (/';/)  =  ^-7  E  ■ 

2K  +  1  kJ^-K 

This  shows  that  the  estimator  is  approximately  unbi¬ 
ased  for  the  true  evolutionary  power  spectrum  C2(f,t ) 
with  proper  normalization.  The  variance  of  this  esti¬ 
mator  is  essentially  the  same  as  in  the  stationary  case, 
with  N  replaced  by  2K  +  1,  the  length  of  the  interval 
where  the  process  is  assumed  stationary.  Obviously, 
the  estimator  is  not  consistent  due  to  the  nonstation¬ 
ary  nature  of  the  process. 

The  expected  value  of  the  estimated  third-order 
wavelet  spectrum  of  a  zero  mean  stationary  process 
X  ( t )  can  be  written  as  [9] 

E{Mi(fuf2)}  = 

rl/2 

/  Wi  -f[,t2-  r2,  fu  h mi,K)  df\df2. 

j- 1/2 

(13) 


Here  B(f\,f2)  is  the  true  bispectrum  and  the  total  bis- 
pectral  window  V3(fi,  f2,  fi,  h)  is  given  by 

V3{flf'2,fuh)  = 

jj  E  G(/i ; k .  h)G{&  k,  h)G*U[  +  til  k,  ti  +  f2). 

k= 0 

(14) 


The  estimator  may,  as  in  the  second-order  case,  be 
normalized  to  yield  an  asymptotically  unbiased  esti¬ 
mate  [9].  Assuming  a  stationary  Gaussian  process,  the 
variance  of  the  normalized  estimator  may  be  written  as 


Var 


{B(fl,f2)} 
S(fi)S(f2)S(f) 


52(0;  /i)g2(0;  72)52(0;/) 


N 

L 

|53(0,0;/i,/2)|2 

N-l 

«E(>- 

T  — 1 

T  > 

j  g2{ri  ti)g2{T;  f2)g2{r;  f) 

iV> 

153(0, 0;  fi,  f2)\2 
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where  g3 (ri ,  T2;  fi,f2)  is  the  triple  correlation  given  by 


170 


93(ti,  r2;/i,/2) 


■9 


2?r/i  ^ ,  _  \i  r 2ivh  u ,  _  N 

— +  t-i)J  g  \  —-(t  +  T2) 


dt 


and  f  =  fi  +  f2.  The  variance  of  the  biperiodogram 
Bp(fi,h)  under  the  same  assumptions  can  be  written 
as  Var{ Bi>(fuf2)}  «  NS(f1)S(f2)S(f)  (see  e.g.  [10]). 
We  thus  see  that  the  variance  is  reduced  by  a  factor 
N2 lvN(fi,f2),  where 


M/1,/2)  = 


g2(0;/i)g2(0;/2)ff2(0;/) 

|53(0,0;/1,/2)|2 


+  2  Y'  (l  -  — )  3^  h)92{T\f2)g2{T\  }) 
N)  l.93(0,0;/1,/2)|2 


relative  to  a  (possibly  tapered)  biperiodogram. 

As  in  the  second-order  case,  the  expressions  for  non¬ 
stationary  processes  are  essentially  the  same  as  in  the 
stationary  case,  with  N  replaced  by  the  length  of  a 
stationary  interval,  and  sums  over  all  time  instants  with 
sums  only  over  a  stationary  interval. 

5.  NUMERICAL  EXAMPLE 
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Figure  1:  Magnitude  squared  CWT  estimate 

|wrt/,(fc,/)|  of  a  realization  xn  of  the  process  in  equa¬ 
tion  (15). 


the  two  other.  The  frequency  of  one  of  the  oscillators 
is  allowed  to  vary  with  time,  providing  a  time  changing 
phase  coupling.  The  signal  model  is 

3 

Xn  =  ^2  cos  (27r fin  +  6i)  +  Nn.  (15) 

i= 1 


5.1.  Choice  of  wavelet 

In  the  numerical  simulations  we  have  chosen  a  wavelet 
on  the  form  (6)  with  a  window  given  by  a  Gaussian, 
i.e.  g(t)  =  exp  [-t2/(2(72)]  where  er2  is  a  user  spec¬ 
ified  parameter.  This  choice  of  wavelet,  the  Morlet- 
Grossmann  wavelet,  is  well  known  for  its  good  simul¬ 
taneous  time  and  frequency  resolution.  The  wavelet  is 
not  exactly  analytic,  but  if  we  chose  the  parameters 
such  that  rfo2  >>  1,  the  non-analytic  part  is  negligi¬ 
ble.  The  parameter  a 2  affects  the  time  and  frequency 
resolution.  The  frequency  resolution  is  A///  oc  1/(4<t) 
while  the  time  resolution  is  A r  oc  0 //.  Thus  the  trade¬ 
off  between  time  and  frequency  resolution  is  controlled 
by  a2.  In  the  following  example  we  have  used  77  =  47t 
and  0 2  =  1.5. 


Here  f\  is  a  non-decreasing  piecewise  constant  function, 
f2  =  0.25  and  /3  =  f\  +  /2.  The  phases  9\  and  02  are 
independent  phases  drawn  from  a  uniform  distribution 
U[— 7r,7r]  and  02  =  6\  +  02.  The  additive  noise  Nn  is 
white,  zero  mean  and  Gaussian,  with  variance  a2N  — 
0.152. 

To  detect  phase  coupling  we  use  a  wavelet  based 
squared  bicoherence  estimator  [5,  6] 


bl(h,f2;L)  = 


Bw{fuh\L) 

2 

1  v'T'+TC 
2K+1  2-ik=L— K 

^(fc,/i)W0(fc,/2) 

2  Sw(fi  +  f2;  L) 

(16) 


5.2.  A  piecewise  stationary  process 

The  wavelet-polyspectra  give  us  the  opportunity  to  an¬ 
alyze  piecewise  stationary  processes,  which  obviously 
have  spectral  representations  of  the  form  given  in  (1). 
To  demonstrate  this,  we  provide  a  numerical  example. 
The  chosen  signal  consists  of  three  harmonic  oscilla¬ 
tors,  where  the  third  is  completely  phase  coupled  to 


where  we  have  introduced  the  wavelet-bispectrum  Bw  = 
M3"  and  the  wavelet  power  spectrum  Sw  =  M™ .  A 
squared  bicoherence  spectrum  measures  the  fraction  of 
power  at  a  given  frequency  due  to  three-wave  interac¬ 
tion  [11]. 

Figure  1  shows  the  magnitude  squared  CWT  esti¬ 
mate  of  a  realization  xn  of  the  process  in  equation  (15). 
Notice  the  structures  corresponding  to  the  constant  fre- 
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Figure  2:  Estimated  evolutionary  bicoherence  spec¬ 
trum  of  the  process  in  equation  (15). 


quency  component  with  fa  =  0.25,  the  piecewise  con¬ 
stant  frequency  component  fa  and  the  sum  frequency 
component  fa  =  fa  +  fa.  We  can  easily  see  that  the 
period  of  stationarity  of  the  process  is  chosen  to  be  150. 

Figure  2  shows  a  contour  plot  of  the  estimate  of 
the  proposed  evolutionary  bicoherence  spectrum,  using 
the  estimator  introduced  in  (16).  The  full  line  passes 
through  the  true  coupling  frequencies.  The  true  bico¬ 
herence  value  along  this  line  is  exactly  1,  since  all  the 
power  is  due  to  three- wave  interaction.  The  estimates 
are  performed  at  time  instants  L  corresponding  to  the 
midpoint  of  each  stationary  interval,  and  we  have  used 
K  =  75.  The  maximum  value  of  the  estimate  hits  as 
close  to  the  correct  frequency  as  possible  for  each  time 
instant  L,  and  the  estimated  values  of  the  maxima  are 
about  0.97.  Note  that  the  frequency  resolution  of  the 
estimate  gets  coarser  with  increasing  frequency  due  to 
the  constant-Q  property  (A///  =  const.)  of  the  CWT. 


We  have  illustrated  our  method  by  a  relevant  nu¬ 
merical  example.  The  theoretical  properties  of  the  wavelet- 
polyspectra  and  the  numerical  example  clearly  demon¬ 
strate  the  potential  of  this  technique  for  the  analysis  of 
higher-order  spectral  properties  of  non-stationary  pro¬ 
cesses. 
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ABSTRACT 

In  this  paper,  a  sophisticated  adaptive  seismic  compression 
method  is  presented  based  on  wavelet  shrinkage.  Our  approach 
combines  a  time-scale  transform  with  an  adaptive  non-linear 
statistical  method.  First,  a  discrete  2-D  biorthogonal  Discrete 
Wavelet  Transform  (DWT)  is  applied  to  the  multi-channel 
seismic  signals  to  generate  a  sparse  multiresolution  (subband) 
decomposition.  Compression  is  then  achieved  by  shrinking  the 
detail  wavelet  coefficients  using  a  scale-dependent  non-linear 
soft-thresholding  rule.  The  adaptive  scale-dependent  thresholds 
are  determined  by  minimizing  the  Stein’s  Unbiased  Risk 
Estimate  (SURE).  The  proposed  compression  procedure  is  tested 
on  marine  seismic  data  from  the  Midyan  basin  ( Red  Sea,  Saudi 
Arabia). 

1.  INTRODUCTION 

Seismic  compression  is  a  key  technology  for  managing  seismic 
data  in  a  world  of  ever  increasing  data  volumes  to  maintain 
productivity  without  compromising  interpretation  results.  By 
storing  data  in  a  format  that  requires  less  space  than  the  original 
data  volume,  seismic  compression  provides  greater  flexibility  in 
managing  local  or  remote  server  disk  space  as  well  as  reducing 
network  traffic.  Seismic  compression  not  only  enables 
explorationists  to  maximize  the  value  of  the  information 
technology  infrastructure,  but  it  encourages  innovative 
interpretation  workflow  to  leverage  the  vast  information  content 
in  massive  seismic  data.  Compression  thus  helps  in  maintaining 
or  exceeding  current  productivity  levels.  Recently,  seismic 
compression  has  benefited  from  the  advent  of  wavelets  [I J, 
which  offer  mathematical  constructions  with  a  great  potential  in 
statistical  methodology  [2],  Wavelet  transforms  have  been 
applied  extensively  in  diverse  applications  including  data 
compression  and  denoising,  image  analysis,  economics  and 
statistics  [3], 

In  this  paper,  a  sophisticated  wavelet-based  compression 
technique  is  proposed.  Both  the  transform  and  the  compression 
stages  are  matched  up  in  view  to  improving  the  overall 
performance.  The  rationale  of  our  approach  is  first  to  generate  a 
(near)-sparse  representation  of  the  data  in  the  wavelet  domain 
then  to  threshold  an  important  part  of  the  coefficients  without 
losing  substantial  information.  In  order  to  exploit  the  multi¬ 
channel  seismic  data  correlation  in  space  and  time  directions, 
a  2-D  biorthogonal  DWT  is  used.  For  seismic  interpretation,  the 
visual  aspect  of  seismic  signals  is  of  utmost  importance.  This 
factor  is  taken  into  account  by  judiciously  selecting  both  the 
wavelets  and  the  thresholding  procedure.  Thus,  a  DWT  using 


long  wavelet  filters  from  the  Cohen-Daubechies-Feauveau 
(CDF)  class  [4]  is  intimately  associated  with  a  non-linear  smooth 
operator,  namely  wavelet  shrinkage.  Biorthogonal  wavelets  offer 
a  good  trade-off  between  the  support  size,  the  number  of 
vanishing  moments  and  regularity.  In  other  words,  the  DWT  is 
computed  efficiently  while  preventing  the  appearance  of  artifacts 
in  the  reconstructed  data.  In  addition,  the  sparsity  of  the 
multiresolution  decomposition  is  best  exploited  by  wavelet 
shrinkage.  The  latter  consists  of  applying  a  soft-thresholding  rule 
to  all  the  wavelet  coefficients  but  those  belonging  to  the  lowest 
resolution  subband.  Indeed,  the  latter  is  merely  a  smooth  scaled 
version  of  the  input  data  and  carries  the  essence  of  the  data. 
Moreover,  it  has  coefficients  of  much  smaller  magnitude  than 
those  of  the  detail  subbands  do.  Thus,  this  makes  its  contribution 
to  the  compression  gain  marginal.  The  values  of  the  scale- 
dependent  thresholds  are  determined  by  minimizing  the  SURE 
[5][6],  The  proposed  compression  procedure  is  tested  on  marine 
seismic  data  from  the  Midyan  basin  (Red  Sea,  Saudi  Arabia)  [7]. 

2.  MULTICHANNEL  SEISMIC  SIGNALS 

Oil  and  gas  are  usually  buried  deep  within  the  earth,  often  miles 
below  the  surface.  Most  of  the  easy  or  shallow  oil  has  been 
found.  A  number  of  exploration  methods  are  available  but  in 
general  only  the  modem  seismic  reflection  method  comes  close 
to  providing  both  the  ability  to  see  down  to  great  depths  and  to 
see  the  details  of  the  subsurface  needed  to  locate  many 
hydrocarbons  [8],  Seismic  data  stem  from  a  multiscale  non-linear 
distributed  parameters  remote  system,  i.e.  the  earth.  In  a  typical 
scenario,  a  spatially  distributed  acoustical  signal  is  generated  by 
a  source  (e.g.  dynamite)  located  at  the  surface  of  the  earth.  The 
generated  waves  propagate  downward,  undergo  reflection  at 
contacts  with  different  acoustic  impedance,  and  are  recorded  by 
an  array  of  seismometers  at  the  surface.  This  provides  the  multi- 
channels  discrete  seismic  signals  that  are  mapped  into 
representations  of  the  earth’s  interior  properties.  The  underlying 
complex  process  is  referred  to  as  seismic  imaging.  The  latter  is 
intended  to  find  earth  models  that  explain  (or  best  fit)  seismic 
observations.  Seismic  signals  are  commonly  displayed  using  a 
variable-area  and/or  a  variable-density  mode  [8], 

3.  MULTIRESOLUTION 
DECOMPOSITION 

3.1  2-D  Wavelet  Bases 

There  are  two  different  ways  to  build  a  wavelet  basis  for  a  2-D 
space,  say  t  and  x.  The  standard  dyadic  construction  consists  of 


0-7803-5988-7/00/$10.00  ©  2000  IEEE 


544 


all  possible  tensor  products  of  1-D  wavelet  and  scaling  basis 
functions  defined  respectively  as: 


(t)  =  22y  jk(2Jt-k) 

(la) 

1 

(t)  =  22ifjk(2Jt-k) 

(lb) 

However,  despite  its  simplicity,  the  construction  that  requires 
different  scale  indices  for  each  direction,  does  not  benefit  from 
the  recursive  Mallat  algorithm  [9],  Indeed,  for  an  in  x  m  matrix 
data  the  standard  dyadic  decomposition  requires  4(m2-m) 
assignment  operations  against  8/3(m2-l)  only  for  the  nonstandard 
one  [10].  Consequently,  in  the  sequel  the  nonstandard  dyadic  2-D 
decomposition  is  adopted.  It  consists  of  defining  a  2-D  scaling 
function,  using  a  unique  scale  index;  as: 

Vjkk'{t’x)=(?jkmjkM  (2) 

and  three  2-D  wavelet  functions  at  each  scale  given  by: 


vir  ^  (t,x)  =  22y\\t(22  t-k,22  x-k') 

jkk 

i 

Vf*kk,(t,x)  =  22xwaj  I -k,2j  x-k') 


\V®kkXt,x)  =  22\\ry(2jt-k,2j  x-k') 


These  three  anisotropic  wavelets  extract  matrix  data  details  at 
different  scales  and  orientations,  whereas  the  scaling  function 
yields  a  smoothed  low-resolution  version  of  the  input  data. 
Indeed,  starting  at  scale  the  multiresolution  decomposition 
yields  four  double-scaled  half-resolution  panels  at  scale;'-/.  One 
of  them  represents  a  smoothed  version  of  the  data  while  the 
remaining  ones  contain  detail  wavelet  coefficients  corresponding 
to  the  /vj rv,  i/',  i/J/  wavelet  functions  that  are  respectively 
oriented  vertically,  horizontally  and  diagonally.  The  result  of  the 
nonstandard  dyadic  2-D  DWT  is  usually  displayed  in  four  panels 
as  in  Fig.l. 


Figure  1.  Nonstandard  dyadic  2-D  wavelet 
multiresolution  decomposition.  L  and  H  stand  for  low- 
and  high-pass  wavelet  filters,  and  the  subscripts  r  and  c 
stand  for  row  and  column  respectively 


3.2  Biorthogonal  Wavelet  Bases 

There  are  three  main  categories  of  wavelet  bases,  namely  the 
orthogonal,  the  semi-orthogonal  and  the  biorthogonal.  Limiting 
ourselves  to  orthogonal  wavelet  bases  can  be  overly  restrictive 
because  except  for  the  Haar  basis,  there  are  no  other  bases, 
which  are  compactly  supported  and  symmetric.  The  relaxation  of 
orthogonality  constraint  has  many  benefits  that  improve  the 
performance  of  the  wavelet  transform  while  still  being 
implemented  with  the  Mallat  algorithm.  In  particular, 
biorthogonal  wavelets  offer  a  good  trade-off  between  the  support 
size,  the  number  of  vanishing  moments  and  regularity.  In  term  of 
digital  filters,  the  biorthogonal  transform  uses  different  Finite 
Impulse  Response  (FIR)  wavelet  filters  in  the  decomposition  and 
reconstruction  stages.  This  provides  more  flexibility  in  the  design 
of  the  transform  and  its  inverse  [11].  Moreover,  FIR  filters  are 
preferred  because  they  guarantee  a  linear  phase,  which  is  a  very 
desirable  property  that  prevents  from  the  appearance  of  artifacts 
in  the  reconstructed  data.  Therefore,  the  biorthogonal  transform 
uses  dual  wavelet  and  dual  scaling  functions  related  to  the  primal 
ones  by: 

(%t  Hv)  -  °|  for  all  j,k,k'  (4) 

(¥Jv|<M  =  0j 

In  this  contribution  long  biorthogonal  wavelets  filters  of  the  CDF 
class  are  used  [4]. 

3.3  Nonstandard  Dyadic  2-D  Decomposition 


The  extension  to  2-D  separable  biorthogonal  bases  is 
straightforward.  Indeed,  by  alternating  the  1-D  wavelet  filtering 
operations  on  rows  and  columns  a  2-D  dyadic  nonstandard 
decomposition  is  generated.  This  scheme  is  implemented  with  a 
two-channel  filter  bank  [11],  where  the  low-pass  and  high-pass 
filters  represent  the  scaling  and  the  wavelet  functions 
respectively.  First,  a  one  step  low  pass  ( L )  and  high  pass  (H) 
filtering  is  performed  on  each  row  of  the  matrix  I.  Next,  the  same 
filters  are  applied  to  each  column  of  the  resulting  matrix.  The 
whole  process  is  applied  recursively  to  the  quadrant  containing 
averages  in  both  directions,  i.e.  LrLcI  panel.  The  resulting 
recursive  decomposition  is  illustrated  by  Fig.l.  and  the 
implementation  of  the  one-stage  2-D  DWT  is  depicted  by  Fig.2. 


Figure  2.  One-stage  non-standard  2-D  DWT 
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4.  WAVELET  SHRINKAGE 


(6) 


4.1  Motivation 

Wavelet  compression  is  best  understood  from  an  approximation 
viewpoint.  In  a  wavelet  decomposition,  each  wavelet  picks  up 
information  about  the  data  at  a  given  location  k  and  at  a  given 
resolution  or  scale  j.  Thus,  the  wavelet  transform  allows  us  to 
focus  on  the  most  relevant  part  of  the  data  provided  the  wavelet 
bases  fit  the  input  data.  Consequently,  the  resulting  wavelet 
coefficients  drop  off  rapidly  yielding  a  (near)-sparse  data 
representation.  This  is  also  known  as  energy  compaction 
property.  Wavelet  thresholding  constitutes  thus  a  natural  choice 
to  perform  compression.  It  is  a  simple  yet  a  very  efficient 
procedure  for  keeping  the  most  important  coefficients  that  will  be 
used  in  reconstruction.  An  intuitive  way  to  achieving 
thresholding  consists  of  applying  a  keep-or-kill  rule  referred  to  as 
hard-thresholding.  However,  a  soft-thresholding  is  preferred 
because  of  various  advantages.  From  a  visual  point  of  view,  the 
reconstructed  data  offer  a  more  pleasant  aspect,  and  do  not 
exhibit  visible  artifacts.  This  is  crucial  in  the  case  of  seismic  data 
interpretation.  From  a  statistical  point  of  view,  soft-thresolding 
uses  a  continuous  function,  leading  to  simple  data  driven 
selection  of  the  thresholds.  In  fact,  the  selection  of  the  thresholds 
is  a  very  delicate  and  important  statistical  problem.  On  one  hand, 
killing  too  many  wavelet  coefficients  may  lead  to  an  important 
bias  in  the  reconstructed  data.  On  the  other  hand,  small 
thresholds  lead  to  a  poor  compression  gain.  Thus,  threshold 
selection  should  strike  the  balance  between  closeness  to  fit 
between  the  original  and  the  reconstructed  data  and  the  degree  of 
sparsity  of  the  wavelet  coefficients.  We  propose  to  achieve 
compression  through  an  adaptive  soft-thresholding  procedure, 
referred  to  as  wavelet  shrinkage.  The  selection  of  the  optimal 
threshold  for  all  the  scales  but  the  coarsest  one  is  accomplished 
by  minimizing  the  SURE.  The  resulting  nonlinear  thresholding 
operator  is  called  SureShrink  [5]. 

4.2  SURE  Principle 

SURE  has  been  initiated  by  Stein  for  mean  estimation  of  a 
multivariate  normal  distribution  [12]  and  has  been  successfully 
applied  for  function  smoothing  by  Donoho  [5],  The  foundation  of 
the  SURE  principle  is  based  on  the  fact  that  for  nearly  arbitrary 
nonlinear  biased  estimator,  the  loss  or  risk  can  be  estimated 
unbiasedly.  In  the  sequel,  we  outline  the  SURE  principle  for  the 
general  case  then  in  the  next  paragraph  we  show  how  to  derive 
the  SureShrink  operator. 

Consider  an  empirical  data  vector y  of  dimension  A' given  by 

yt=f,+eit  i  =  0,\,...,N  - 1  (5) 


where  f  are  samples  of  the  deterministic  function  /  and  e  is 
Gaussian  white  noise  with  independent  identical  distribution 
(i.i.d)  N(0,a). 

The  objective  is  to  find  the  best  estimate  of  the  function /in  the 
mean  square  sense  by  minimizing  the  Mean  Square  Error  (MSE) 
risk  defined  as, 


„  ■  i  F  r  i  Jv— i  „ 

*(/./)  =  — B/-/!'=-^X(/<-/l)2 

/v  ™  ;=o 

However,  the  main  drawback  of  the  MSE  risk  is  that  in  practice, 
it  can  never  be  computed  exactly  because  it  relies  on  the 
unknown  exact  value  of  the  function  f  Thus  in  practical 
situations  this  MSE  has  to  be  estimated.  The  SURE  principle 
stipulates  that  if  we  consider  the  following  estimate  for  the 
unknown  function  f 

f(y)  =  y+g(y )  (7) 


where  g(y)  is  a  weakly  differentiable  function  from  RN  to  RN, 
then  an  unbiased  estimator  for  the  MSE  risk  is  the  SURE  defined 
as  [12]: 

R™E(f(y)J)  =  N  +|g(y)||2  +2Vvi(y)  (8) 


where  V  is  the  vector  differential  operator  of  first  partial 
derivatives ,  i.e., 


4.3  Adaptive  Wavelet  Shrinkage  with  SureShrink 

There  are  two  main  classes  of  wavelet  shrinkage  regarding 
whether  the  threshold  is  single  and  global  or  scale-dependent  and 
adaptive.  Our  approach  consists  of  deriving  a  scale-dependent 
threshold  A.j  according  to  the  following  soft-thresholding  rule: 


£,(</)  =  sgn(<f/ )(!<//  I  -X’ )/ (I  d‘k  l>A;)  (10) 


For  a  given  detail  subband  at  resolution  j,  the  shrinkage  operator 
gx(d)  kills  all  those  coefficients  below  the  threshold  XJ  and  pulls 
towards  the  origin  the  surviving  ones  by  an  amount  equals  to  the 
threshold.  The  different  scale-dependent  thresholds  stem  from 
the  minimization  of  the  SURE,  i.e., 

U  -  arg  min  R™RE  (k‘ ,  d’k )  (11) 


where  the  SURE  for  a  soft-threshold  estimator  is  given  by  [5], 
KmE  =  2'  -  2/{  I  d[  l<  V }  +  %  min  { I  d[  I,  V  }2  (1 2) 


Note  that  the  underlying  optimization  problem  is  straightforward 
and  the  computational  effort  is  of  order  2 j  log(2f )  as  a  function  of 
the  subband  size  2j  [5]. 
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5.  EXPERIMENTAL  RESULTS 

A  migrated  marine  seismic  profile  from  the  Midyan  basin  in  the 
Red  Sea  is  used  to  demonstrate  our  adaptive  seismic  compression 
by  wavelet  shrinkage.  The  discrete  2-D  seismic  data  consist  of  a 
collection  of  2838  seismic  signals  (traces)  of  2.5  seconds  length 
each  sampled  at  4  milliseconds.  The  traces  correspond  to  the 
Common  MidPoints  (CMP),  i.e.,  successive  reflection  points 
midway  between  the  different  seismic  source  locations  and  the 
seismometers.  The  data  can  be  regarded  as  a  (626x2838)  matrix 
of  floats  entries.  The  variable  density  display  mode  is  used  to 
represent  the  profile,  which  can  be  thought  of  as  a  transversal 
section  of  the  prospected  area  along  the  seismic  line.  First  we 
have  applied  a  2-D  DWT  with  asymmetric  long  biorthogonal 
wavelet  filters  CDF(6,8)  where  the  numbers  of  vanishing 
moments  for  the  synthesis  and  analysis  wavelets  are  6  and  8 
respectively.  The  reader  may  be  wondering  why  the 
reconstruction  wavelet  filters  are  shorter  and  have  less  vanishing 
moments  than  the  decomposition  ones.  First  note  that  this  is 
made  possible  thanks  to  the  flexibility  of  the  biorthogonal 
wavelet  transform.  Second,  the  objectives  of  the  decomposition 
and  reconstruction  stages  differ.  Indeed,  the  main  concern  of  the 
wavelet  decomposition  is  to  pack  the  energy  of  the  input  data  in 
fewer  wavelet  coefficients.  The  higher  the  number  of  vanishing 
moments  is,  the  better  the  energy  compaction  would  be.  From  the 
reconstruction  side,  we  wish  to  use  smooth  wavelets  to  mask  the 
errors  introduced  by  the  wavelet  shrinkage  and  to  get  less 
annoying  visual  artifacts.  Furthermore,  the  reconstruction  time 
should  be  shorter  than  the  decomposition  one  because,  in 
practice,  the  data  set  is  compressed  once  but  may  be 
decompressed  several  times.  A  three-level  multiresolution 
decomposition  has  been  performed  yielding  nine  detail  subbands 
and  one  low-resolution  subband.  The  four  subbands  of  the  first 
level  are  displayed  in  Fig.3. 


Figure  3.  First  level  nonstandard  dyadic  2-D  DWT 
multiresolution  decomposition  of  the  Midyan  section 


Next,  we  have  applied  the  SureShrink  operator  to  the  nine  details 
subbands.  The  experimental  results  are  displayed  in  Fig.4. 
Though,  almost  82%  of  the  wavelet  coefficients  have  been  killed 
by  shrinkage,  95%  of  the  data  energy  is  recovered  in  the 
reconstructed  data.  Furthermore,  the  difference  section  exhibits 
random  noise.  Thus,  an  appreciable  filtering  effect  has  also  been 
produced  by  the  compression. 

6.  CONCLUSIONS 

In  this  work,  a  sophisticated  adaptive  seismic  compression 
technique  was  presented.  A  time-scale  transform  was  associated 
with  a  non-linear  statistical  method.  A  pair  of  different 
asymmetric  biorthogonal  wavelet  filters  was  selected  to  achieve 
different  targets  for  the  compression  and  decompression 
processes.  Analysis  wavelets  with  more  vanishing  moments  were 
used  to  ensure  a  maximum  energy  compaction  of  the  input  data. 
This  made  compression  by  thresholding  a  very  natural  and 
efficient  means.  Based  on  SURE,  the  scale-dependent  thresholds 
were  determined  and  then  the  SureShrink  operator  was  applied  to 
each  detail  subband  to  kill  insignificant  coefficients.  The 
experimental  results  show  that  the  proposed  approach  does  not 
introduces  visible  artifacts  while  achieving  a  relatively  high 
compression  gain. 

7.  ACKNOWLEDGMENTS 

The  authors  wish  to  gratefully  acknowledge  King  Fahd 
University  of  Petroleum  &  Minerals  for  its  support  and  Saudi 
ARAMCO  for  providing  the  data. 

8.  REFERENCES 

[1]  Bosman  C.  and  Reiter,  E.  “Seismic  data  compression  Using 
wavelet  transform.”  63rd  SEG  Expanded  Abstracts,  1261- 
1264, 1993 

[2]  Vidakovic  B.  Statistical  modeling  by  wavelets.  Wiley  Series 
in  Probability  and  Statistics.  John  Wiley  &  Sons,  1999. 

[3]  Meyer  Y.  Wavelets:  Algorithms  and  applications.  SIAM 
1993. 

[4]  Cohen  A.,  Daubechies  I.,  Feauveau  J.  “Biorthogonal  bases 
of  compactly  supported  wavelets.”  Commun.  on  Pure  Appl. 
Math.,  45:485-560,  1992. 

[5]  Donoho  D.  and  Johnstone  I.  “Adapting  to  unknown 
smoothness  by  wavelet  shrinkage.”  Joum.  of  the  Amer. 
Statistical  assoc.,  90:1200-1224, 1995. 

[6]  Misitti  M.,  Missiti  Y„  Oppenheim.  G  and  Poggi  J-M. 
Matlab  wavelet  toolbox.  The  MathWorks,  Inc,  1997. 

[7]  Mougenot  D.  and  Al-Shakhis  A.  “Depth  imaging  a  pre-salt 
faulted  block:  A  case  study  from  the  Midyan  basin  (Red 
Sea).”  Saudi  Aramco  Jour,  of  Tech.  Fall  1998. 

[8]  Sheriff  R.  Geophysical  methods.  Prentice  Hall,  1989. 

[9]  Mallat  S.  A  Wavelet  tour  of  signal  processing.  Academic 
Press,  1998. 

[10]  Stollinz  E.,  DeRose  T.  and  Salesin  D.  Wavelets  for 
computer  graphics:  Theory  and  applications.  Morgan 
Kaufmann  Publishers,  Inc.,  San  Francisco,  1996. 

[11]  Vetterli  M.  and  Kovacevic  J.  wavelets  and  subband  coding. 
Prentice  hall  PTR,  New  Jersey,  1995. 

[12]  Stein  C.  “Estimation  of  the  mean  of  a  multivariate  normal 
distribution”  The  Annals  of  Statistics,  9(6):  1135-1151,1981. 


547 


Tima  <»«c)  Time  (sec)  Time  (sec) 


<t>>  Reconstructed  Mldyan  profile 
6  2  %  killed  coofficlenta,  95  %  E  nerov  preserved 


REPRESENTATIONS  OF  STOCHASTIC  PROCESSES  USING  COIFLET-TYPE 

WAVELETS 


Dong  Wei  and  Haiguang  Cheng 


Center  for  Telecommunications  and  Information  Networking 
Department  of  Electrical  and  Computer  Engineering 
Drexel  University 
Philadelphia,  PA  19104  U.S.A. 

E-mails:  wei@ece.drexel.edu,  hgcheng@io.ece.drexel.edu 


ABSTRACT 

The  wavelet  series  expansion  requires  a  high  computa¬ 
tional  complexity  in  computing  the  scaling  coefficients 
at  the  finest  scale  by  means  of  projection  in  order  to  re¬ 
alize  the  Mallat  algorithm  to  compute  the  wavelet  coef¬ 
ficients  at  coarser  scales.  We  propose  a  fast  and  practi¬ 
cal  algorithm  to  approximate  the  wavelet  series  expan¬ 
sion.  The  algorithm  is  based  on  sampling  and  recon¬ 
struction  with  coiflet-type  wavelets,  which  possess  van¬ 
ishing  moments  on  both  scaling  function  and  wavelet. 
We  evaluate  the  performance  of  the  algorithm  by  es¬ 
tablishing  the  convergence  rates  and  asymptotic  forms 
for  the  mean-square  errors  in  the  scaling  coefficients 
and  wavelet  coefficients  of  the  synthesized  stochastic 
process. 

1.  INTRODUCTION 

During  the  past  decade,  the  theory  of  wavelets  has 
established  itself  firmly  as  one  of  the  most  successful 
mathematical  tools  for  a  broad  range  of  signal  pro¬ 
cessing  applications,  such  as  image  data  compression, 
noise  reduction,  and  singularity  detection.  A  funda¬ 
mental  and  important  problem  in  wavelet-based  mul¬ 
tiresolution  approximation  theory  is  to  measure  the  de¬ 
cay  of  the  approximation  error  as  resolution  increases. 
The  convergence  properties  and  rates  for  the  wavelet 
series  expansion  (WSE)  of  stochastic  processes  have 
been  studied  in  [1].  The  WSE  requires  the  compu¬ 
tation  of  the  scaling  coefficients  at  the  finest  scale  by 
means  of  projection  in  order  to  realize  the  Mallat  al¬ 
gorithm  to  compute  the  wavelet  coefficients  at  coarser 
scales.  Since,  in  practice,  uniform  samples  of  signals 
rather  than  their  analytic  forms  are  often  available,  the 
projection-based  implementation  of  the  WSE  requires 

This  work  was  supported  by  Defense  Advanced  Research 
Project  Agency  under  grant  F30602-00-2-0501. 


numerical  integrals  to  approximate  the  scaling  coeffi¬ 
cients,  which  are  computationally  expensive.  There¬ 
fore,  such  an  implementation  is  far  from  practical. 

In  this  paper,  we  propose  a  fast  and  practical  al¬ 
gorithm  to  accurately  approximate  the  WSE.  The  al¬ 
gorithm  is  based  on  sampling  and  reconstruction  with 
coiflet-type  wavelets,  which  possess  vanishing  moments 
on  both  scaling  function  and  wavelet.  We  study  the  re¬ 
sulting  wavelet  representations  of  stochastic  processes 
and  evaluate  the  performance  of  the  algorithm  by  es¬ 
tablishing  the  convergence  rates  and  asymptotic  forms 
for  the  mean-square  errors  in  the  scaling  coefficients 
and  wavelet  coefficients  of  the  synthesized  stochastic 
process.  This  work  is  parallel  to  the  result  in  [1]  and 
can  be  viewed  as  a  generalization  of  the  result  on  ap¬ 
proximated  WSE  for  deterministic  functions  [2]. 

The  following  simplified  notation  is  used  in  the  pa¬ 
per: 

/roc  _  °° 

-/  ■  S>  £  • 

n  n=-oo 

Due  to  space  limitation,  the  proofs  of  some  of  the  pre¬ 
sented  results  are  not  given  in  this  paper. 

2.  BACKGROUND 

2.1.  Wavelet  Representations 

First,  we  briefly  review  wavelet-based  multiscale  repre¬ 
sentations.  A  continuous-time,  real-valued  stochastic 
process  X(t)  can  be  approximated  by  its  finite-scale 
wavelet  series  expansion: 

Xll(t)  = 

k 

i\  —1 

=  XlS<«[fc]^o,fe(*)+  5Z  (!) 

k  i=io  k 
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where  we  have  used  the  short-hand  notation 
ilk,k(t)  =  2,/V(2it  -  fc) 

for  the  dilated  and  translated  versions  of  ip,  and  ap¬ 
plied  the  notation  to  <Ail5fc(t)  and  (pi0,k(t)  similarly.  The 
functions  </>  and  ip  are  the  scaling  function  and  the 
wavelet,  respectively.  In  this  paper,  we  only  consider 
two-channel,  compactly  supported,  orthonormal  scal¬ 
ing  functions  and  wavelets  [3],  [4].  The  scaling  coeffi¬ 
cients  at  scale  2l  are  defined  as 

Si[k]  =  J  x(t)  (pi,k{t)dt 

and  the  wavelet  coefficients  at  scale  2i  are  defined  as 

Wi[k]  =  J  X(t)ipi<k(t)dt. 

Given  {sijfc]  :  k  G  Z},  the  set  of  scaling  coefficients  at 
the  finest  scale  2n,  the  wavelet  coefficients  {?c;[/c]  :  k  G 
Z}  can  be  computed  efficiently  in  a  recursive  fashion 
via  the  Mallat  algorithm  [5]: 

si[k]  =  ^  h[2k  -  n]  Sj+1[n] 

n 

Wi[k]  =  9[^k  -  n]  si+i  [n] 

n 

for  i  —  i1  -  1,  id  —  2,  where  h[n]  and  g[n]  are 

a  pair  of  finite-impulse-response  conjugate  quadrature 
filters  [4]. 

2.2.  Coiflet-Type  Wavelets 

A  coiflet-type  wavelet  system  of  order  L  satisfies  the 
Coifman  criterion;  i.e.,  the  first  L  wavelet  moments 
vanish 

m^[l]  =  J  tlip(t)  dt  =  0  for  l  =  0, 1, . . . ,  L  -  1 

and  the  second  to  the  Lth  scaling  function  moments 
vanish 

m$[l]  =  J  tl<p(t)  dt  =  J[Z]  for  l  =  0, 1, . . . ,  L  —  1 

where  <5[*]  denotes  the  Kronecker  delta  sequence.  Ex¬ 
amples  of  orthonormal  coiflet-type  wavelets  include  the 
original  coiflets  [6]  and  their  generalized  versions  [7].  If 
a  coiflet-type  wavelet  system  is  used  in  the  WSE  of  a 
deterministic  signal  X  ( t ) ,  then  the  uniform  signal  sam¬ 
ples  approximate  the  scaling  coefficients  accurately  at 
a  sufficiently  fine  scale  [4]: 

Si[k]  =  2~i/2X(2~ik)  +  0(2~iL). 

Such  a  property  is  highly  appealing  in  digital  signal 
processing  applications,  where  samples  of  signals  are 
processed  digitally. 


3.  A  FAST  AND  PRACTICAL  ALGORITHM 
FOR  WAVELET  REPRESENTATIONS 

The  projection-based  approximation  given  in  (1)  has 
a  limitation  in  reality.  In  many  practical  applications, 
the  initial  set  of  scaling  coefficients  {sq[fc]  :  k  G  Z)  is 
difficult  and  computationally  costly,  if  not  impossible, 
to  obtain.  In  most  cases,  uniform  samples  of  signals 
rather  than  analytic  function  forms  are  available. 

To  overcome  this  challenging  difficulty,  we  propose 
a  fast  algorithm  for  computing  the  approximated  scal¬ 
ing  and  wavelet  coefficients  in  the  WSE  based  on  an 
Lth-order  coiflet-type  wavelet  system: 

(i)  the  approximated  scaling  coefficients  at  scale  2l 
are  the  uniform  samples  of  X  ( t )  with  a  sampling 
period  2~l: 

s,[k]  =  2~i/2  X{2~ik) 

(ii)  the  approximated  wavelet  coefficients  at  scale  2l 
are  computed  from  the  approximated  scaling  co¬ 
efficients  at  scale  2t+1  as 

Wi  [fc]  =  Y2  g[2k  -  n]  Sj+i  [n] . 


From  the  above  algorithm,  we  know  that  the  approx¬ 
imated  wavelet  coefficients  at  scale  21  are  the  filtered 
and  decimated  versions  of  the  uniform  samples  of  the 
stochastic  process  with  a  sampling  period  2~,_1. 

An  alternative  to  the  above  algorithm  would  be  to 
use  the  densest  samples  {2~i'/2X(2~iik)  :  k  G  Z}  as 
an  initial  set  of  scaling  coefficients  to  trigger  the  Mallat 
algorithm  to  compute  wavelet  coefficients  at  all  coarser 
scales.  However,  the  proposed  algorithm  possesses  the 
following  two  advantages  over  the  alternative: 


•  since  the  proposed  algorithm  is  nonrecursive,  the 
approximation  error  at  the  finest  scale  2!l  does 
not  propagate  across  the  coarser  scales; 


•  since  no  computation  is  required  for  scaling  coef¬ 
ficients,  the  computational  complexity  of  the  pro¬ 
posed  algorithm  is  half  of  the  computational  com¬ 
plexity  of  the  Mallat  algorithm. 


Let  Rxx(s,t)  =  £,{A(s)X(t)}  denote  the  auto¬ 
correlation  function  of  the  stochastic  process  X  ( t ) .  We 
assume  that  Rxx  is  sufficiently  smooth  in  the  sense 


that 


R 


(m,n) 

XX 


(s,t) 


dm+nRxx(s,t) 
dsm  dtn 


exists  and  is  finite  for  any  s,t  G  R  and  any  m,n  G 
Z ,m  +  n  <  K,  where  K  is  a  sufficiently  large  integer. 
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The  following  two  propositions  evaluate  the  approx¬ 
imation  accuracy  of  the  proposed  algorithm  in  terms  of 
mean-square  error  (MSE). 

Proposition  1.  The  MSE  in  the  scaling  coefficient 
Si[jfc]  has  the  asymptotic  form 

£{(s,[fc]  -s#])2} 

L+n (^\  (m^[L ])2  „(l,l )  (  k  k\ 

=  2  \l)~  ~v~  xx 

+0(  2~i(2L+2'>). 

Proof.  It  follows  that 

E{(si[k]  -s#])2} 

=  E{(Si[k})2}  -  2E{si[k]si[k]}  +  E{(si[k])2} 

=  E^JJ  X(fi)X(t2)^,fe(fi)^(<2)  dhdt2 1 

-2E^2-i/2X(2~ik)  J X(t)4>i,k(t)dt^ 
+E{2~iX{2~ik)X(2-ik)} 

=  2 1  j  J  Rxx{h,t 3)0(2**!  -  ~  k)dhdt2 

-2  J  Rxxit^k^&t-kjdt 

+2-iRxx{2~ik,2~ik) 

=  2 ~<JJ  Rxx{^^,~^j  <P(si)<t>(s2)dSlds2 

-2~i+x  J  Rxx  <t>(s)  ds 

+2-iRxx(,2~ik,2-ik). 

By  taking  a  Taylor  series  expansion  of  Rxx  at 
(2~lk,2~lk),  we  have 

f si  +  k  s2  +  k\ 

Rxx[~2i-,-2i-) 

I-  /1\  T?(n,l—n)(n -it,  o—i-U} 

V'  /  1  \  KXX  K)  „nj-n 

-  2^  2s  L  J  2ill\  Sl'2 

Z=0  n-0  V  7 


7\  —  n)  (f)— it.  O— iM 

l\KXX  \Z  Kt>Z  K)  ~nj-n 

n  ¥n\  SlS2 


+0(  2~i(-2L+1'>) 


n  fs  +  k  k\  ^R{^(2-%2-ik)J 

Rxx{—’2 y  ~  2s  ¥T\  s 


+0(2_i(2i+1)). 


Using  the  vanishing  moment  property  of  coiflet-type 
wavelets,  we  infer  that 

£{(s«[fc]  -s#])2} 


=  EE  0  xxXw  Vkn 

1=0  n=0  K  J 


2-ifc,2-ife)  rn 

~2E 

1=0 


+2~iRXx(2~ik,  2~{k)  +  <D{ 2~i(2i+2)) 

^  R{^(2-ik,2-ik)  +Rix'$(2-%2~ik) 

~  2s  2i((+i)/!  1 


,  R{xxL)(2-%2~'k) 


(m0[L])2 


T  V  L  ;  2*(2L+1)(2L)! 

7 — L 


/2L\  Ryy’(2^ik,2~ 

~  (ij  2i(2L+1)(2L)! 
+0(2_‘(2i+2)). 


Proposition  2.  The  MSE  in  the  wavelet  coefficient 
Wi[k)  has  the  asymptotic  form 

E{{Wi[k]  -  Wi[k})2}  =  C*  •  2-i(«+1)  •  R%L/L\ 0, 0) 
+0(2“i(4i+2)) 


4L\  (2L\2(mi>[L))2 


where 


*  \2LJ\LJ  22i(4L)! 

Proposition  2  can  be  proved  in  a  similar  way  to 
Proposition  1. 

The  above  two  propositions  indicate  that  the  MSE 
in  the  wavelet  coefficients  decays  faster  than  the  MSE 
in  the  scaling  coefficients  as  the  product  iL  increases. 

4.  WAVELET  REPRESENTATIONS  OF 
STOCHASTIC  PROCESSES 

We  study  the  approximation  accuracy  of  the  wavelet 
representations  based  on  the  proposed  algorithm. 

The  sequence  of  stochastic  processes 

Xi(t)  = 

k 

for  t  6  [a,  b ]  and  i  G  Z,  can  be  viewed  as  successive  ap¬ 
proximations  of  X  ( t )  over  the  interval  [a,  b]  using  the 
dilated  and  translated  scaling  functions  of  some  Lth- 
order  coiflet-type  wavelet  system  as  the  interpolants. 
The  following  proposition  expresses  the  auto-correlation 
function  of  the  process  Xi(t)  and  the  cross-correlation 
function  of  Xi(t)  and  X(t)  asymptotically  in  terms  of 
the  auto-correlation  function  of  X(t). 
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Proposition  3.  For  any  s,t  6  [o,  b], 

RXiXi  ( s,t ) 

=  Rxx{s,t) 

■  R{xx\sR)  Pl(^s)  +  Rx'^jsR)  pi{2lt) 

(-1)L2  iLL\ 

+0(  2~i(i+1))  (2) 


Rxx,(s,t)  =  Rxx(s,t)  +  f;^0p. 


+0{  2~i(L+1)) 


where  pi  is  a  periodic  function  with  unit  period  and  is 
given  by 

Pi(t)  =  ^T{t  -  k)l<j>(t  -  k). 

k 

We  use  the  integrated  MSE 


ei  =  E<  I  [X{t)  -  Xi{t)Y  dt 


to  measure  the  approximation  accuracy  at  scale  2l  and 
evaluate  the  asymptotic  approximation  performance  in 
the  next  proposition. 

Proposition  4.  The  integrated  MSE  possesses  the 
asymptotic  form 

rb 

e?  =  <V  2~2iL  ■  /  R{^L)(t,t)dt  +  0(2^2L+R) 

«/  a 

where  the  constant  C#  is  given  by 


1  f1 

C<t>  =  Jo  pl(t)dt. 


Proof.  The  MSE  can  be  rewritten  as 

ei  =  f  [Rxx(t,t)-2RxXi(t,t)  +  RXiXi(t,  t)]dt.  (6) 
J  a 

For  i  sufficiently  large,  the  0( 2~i(2L+1))  error  terms 
in  (2)  and  in  (3)  become  negligible,  and  we  can  use 
the  pointwise  estimates  in  Proposition  3  to  obtain  the 
asymptotic  form  of  the  error  in  (6), 


lim  1 

i— yoo  2  2l^ 


—  A™  (LI)2  f  ^xx  ^ (M)  dt 

-8&w  X  JL'W'&S)*™* 

m=21  a+1  2* 


=  lim  V  (™  J) 

i-yoo  (LI)2  xx  \2l  2l) 

ra=2*a+l 

X  \jo  Pi(2T)  dt 


1  f  t 

~  W  [la 


(t,t)  dt 


p\(t)dt  . 


If  R-xx  (t,t)  decays  sufficiently  fast  towards  infini¬ 
ties  in  the  sense  that 


J  R^xx\t,t)dt  <  oo, 


then  Proposition  4  holds  for  the  limiting  cases  a  =  -  oo 
and/or  b  =  oo. 

Since  a  deterministic  function  X  ( t )  may  be  consid¬ 
ered  as  a  degenerate  stochastic  process  with  an  auto¬ 
correlation  function  Rxx(s,t)  =  X(s)X(t ),  Proposi¬ 
tion  4  can  be  viewed  as  a  generalization  of  the  re¬ 
sult  on  sampling-based  approximation  of  deterministic 
functions  in  [2],  which  corresponds  to  the  special  case 
R^x'x  \t,  t)  —  [X(L)(t)]2.  Indeed,  the  above  constant 
CV  is  identical  to  the  asymptotic  constant  in  the  deter¬ 
ministic  case  [2].  Therefore,  can  be  computed  via 
the  algorithm  described  in  [2],  provided  that  the  filter 
h  is  known. 

5.  WAVELET  REPRESENTATIONS  OF 
STATIONARY  STOCHASTIC  PROCESSES 

Let  Rxx{t)  =  E{X(t-r)X(t)}  be  the  auto-correlation 
function  of  a  WSS  process  X(t).  We  assume  that  Rxx 
is  sufficiently  smooth  in  the  sense  that  Rxx  (r)  exists 
and  is  finite  for  any  r  6  R,  where  K  is  a  sufficiently 
large  integer. 

The  next  proposition  expresses  the  auto-correlation 
function  of  the  process  Xi(t)  and  the  cross-correlation 
function  of  Xt(t)  and  X  (t)  asymptotically  in  terms  of 
the  auto-correlation  function  of  X(t). 

Proposition  5.  For  any  t  and  t  such  that  t  €  [a,  6] 
and  t  -  r  G  [a,  b], 

RXiXi(t,t-T) 

=  RxX(r)  +  Rxkr)[(-l)LPL(2H)+pL(nt-r))) 
+0(2-i{L+1)) 


Rxx,(t,t-r)  =  Rxx(T)  +  t  ^xMMTV-r)) 

l=L 

+0(2~i(L+l)). 
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Since  Proposition  5  indicates  that  RxiXiit  —  T>t) 
depends  on  both  t  and  r,  in  general  the  process  X,{t) 
is  not  WSS. 

Proposition  6.  The  integrated  MSE  possesses  the 
asymptotic  form 

e?  =  CV  2~2iL  ■  ( b  -  a)(-l)LR{^(0)  +  0{ 2~i(2L+2)) 

where  the  constant  C#  is  given  by  (5). 

For  a  WSS  process,  the  integrated  MSE  defined  in 

(4)  tends  to  infinity  if  it  is  evaluated  over  (-oo,  +oo). 
Therefore,  it  is  impossible  to  directly  extend  Proposi¬ 
tion  6  to  the  limiting  case.  However,  the  mean  power  of 
the  approximation  error  is  finite  over  (-oo,  +oo)  based 
on  Proposition  6: 

e? 

lim  - — - — 

a->— oo,6— >+oo  0  —  (l 

=  ■  2~2iL  ■  {-1)LR{xx(°)  +  0(2-i(2L+2)). 

For  any  stochastic  process, 


R 


(L,L) 

XX 


(t,t) 


d2LE{X(h)X(h)} 
dt^  dt% 


For  a  WSS  stochastic  process, 


d^EjXjt^Xjh)} 

dt[dt%  tl=t2: 
=  d2LRXx{h-h) 

dt^dt^  ti=t2=t 

=  (~1)LR{S(  0). 


If  the  term  R(^\t,t)  in  Proposition  4  is  replaced 

by  (~l)L R^x  for  a  WSS  Process  ^  is  ap¬ 

parent  that  Proposition  4  contains  Proposition  6  as  a 
special  case.  However,  the  fact  that  the  higher-order 
remainder  in  Proposition  6  is  0( 2~l(2L+2'))  instead  of 
0( 2~i(2i+1)),  i.e.,  the  term  of  the  order  2“d2i+1)  van¬ 
ishes  for  WSS  processes,  may  not  be  directly  obtained 
from  Proposition  4. 


6.  CONCLUSIONS 

We  have  presented  coiflet-type  wavelet  representations 
of  stochastic  processes.  Our  study  shows  that  the  pro¬ 
posed  sampling-based  representations  are  fast,  efficient, 
and  practical.  Therefore,  they  are  promising  in  a  large 
number  of  wavelet-based  applications. 
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ABSTRACT 

The  coherence  function  is  extended  to  nonstationary  ran¬ 
dom  processes  through  introduction  and  investigation  of  a 
coherence  operator  and  time-frequency  (TF)  coherence  func¬ 
tions.  For  underspread  nonstationary  processes,  it  is  shown 
that  TF  coherence  functions  are  a  meaningful  tool  for  non¬ 
stationary  coherence  analysis  and  that  they  provide  approx¬ 
imate  TF  formulations  of  the  coherence  operator. 


1.  INTRODUCTION 


Consider  two  jointly  stationary,  zero-mean,  real  or  circular 
complex  random  processes  x(t)  and  y(t)  with  power  spectral 
densities  Px(f)  and  Py(f)  and  cross  power  spectral  density 
Px,y{f )•  The  coherence  function  [1—3] 


7 ■-„(/)  = 


Px,y(f) 

VP*(f)Py(f) 


is  a  practically  useful,  normalized  measure  of  the  cross-corre¬ 
lation  of  spectral  components  of  x(t)  and  y{t).  It  satisfies 

l7*,»(/)|2  <  1 ,  (1) 

with  |7i,y(/)|2  =  0  iff  x(t)  and  y(t)  are  uncorrelated  pro¬ 
cesses  (Px,y{f)  =  0)  and 


|7*.»(/)I  =  1  (2) 

iff  x(t)  and  y(t)  are  related  by  an  invertible  linear  time- 
invariant  system,  y(t)  =  (k  *  x)(t).  Furthermore,  |7x,j,(/)|2 
is  invariant  to  invertible  linear  process  transformations,  i.e., 
for  aft)  =  (hi  *  x)(t)  and  b(t)  =  (h2  *  y)(t)  we  have 

|7a,6(/)|  =  |7^,j/(/)I  •  (3) 

This  paper  extends  the  coherence  function  to  nonstation¬ 
ary  processes.  Section  2  introduces  and  studies  a  coherence 
operator  of  nonstationary  processes.  Section  3  reviews  some 
time-frequency  (TF)  fundamentals.  Section  4  shows  that 
for  underspread  nonstationary  processes,  the  TF  coherence 
function  introduced  in  [4]  is  an  approximate  TF  formulation 
of  the  coherence  operator  that  approximately  satisfies  sev¬ 
eral  desirable  properties.  Section  5  introduces  a  class  of  TF 
shift  covariant  TF  coherence  functions.  Simulation  results 
are  presented  in  Section  6.  We  note  that  proofs  are  omitted 
due  to  lack  of  space;  most  proofs  can  be  found  in  [5]. 


2.  THE  COHERENCE  OPERATOR 

Let  us  consider  two  nonstationary,  zero-mean,  real  or  circu- 
lar  complex  random  processes  x(t)  and  y(t)  with  autocor- 

*This  work  was  supported  by  FWF  grant  PI  1904-TEC. 


relation  operators  R3 ,  R„  and  cross-correlation  operator1 
Ri,j.  The  coherence  function  is  no  longer  defined;  how¬ 
ever,  by  analogy  to  the  coherence  matrix  [7],  we  define  the 
coherence  operator  of  x(t)  and  y(t)  as 

F  —  R-1/,2R  R-1/2 

where,  e.g.,  R/1/2  denotes  the  inverse  of  the  positive  semi- 
definite  square-root  R^2  of  R,  [6],  Equivalently, 

rx,y  =  R+y , 

where  x(t)  =  (R*1/2a:)(t)  and  y(t )  =  (R y1,2y)(t)  are  sta¬ 
tionary  and  white  with  correlation  Rj  =  Rs  =  I. 

If  x(t)  and  y(t)  are  jointly  stationary,  the  kernel  of  Tx,v 
is^given  by  (Tx^)(ti,t2)  =  %,y(ti  -  t2)  with  %,y(r)  = 
SToo  l*,y{f)  e*2'n*Tdf.  In  this  sense,  Tx<y  is  consistent  with 
the  conventional  coherence  function  7 x,y(f)- 

Bounds.  The  coherence  operator  TIl#  satisfies  bounds 
that  are  analogous  to  (1).  Specifically,  the  singular  values  [6] 
a*  >  0  of  r*,„  are  bounded  as 

(Tk  <  1. 

The  operator  norm  [6]  ||rX|I,||0  =  sup||ff||2=1  ||rBl,ff||a  (with 
IMI2  =  [/r^,  W)\2dt]U2)  is  similarly  bounded  as 

IIU.j/llo  5;  1  • 

Finally,  we  have  the  following  bounds  on  the  (non-negative) 
quadratic  forms  induced  by  the  positive  semi-definite  [6] 
“squared”  coherence  operators  T^r+j,  or  T+vYx,y  (with 
T+j,  the  adjoint  [6]  of  T*,,,):  for  any  g(i)  with  |jp||2  =  1, 

(Px,yPx,y9i  9)  S  1)  p)  <  1, 

with  the  inner  product  defined  as  (x,y)  =  x(t)y’(t)dt. 
Note  that  riiS/  =  0  iff  x(t)  and  y(t)  are  uncorrelated. 

Completely  coherent  processes.  The  “squared”  co¬ 
herence  operators  equal  the  identity  operator, 

r  r+  —  r+  r  _ t 

—  A-x,yix,y  —  -l 

(equivalently,  TXtV  is  a  unitary  operator  [6]),  iff  y(t)=(Kx)(t) 
with  some  invertible  linear  (generally  time-varying)  system 
K.  This  extends  property  (2)  to  the  nonstationary  case. 

Linearly  distorted  processes.  An  extension  of  (3) 
is  possible  only  under  rather  restrictive  assumptions.  Let 

1  is  the  linear  operator  [6]  with  kernel  rx,y(ti , t2)  = 
EM<i)l/*(<2)};  furthermore,  R*  =  Rx,x. 
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a(t )  =  (Hi®)(t)  and  b(t)  =  (H2j/)(<)  with  Hi  and  H2  in¬ 
vertible.  Then  ro,6r+6  =  if  Hi  is  positive  definite 

and  commutes  with  RT .  Similarly,  r+fcro,fc  =  r^yl^y  if  H2 
is  positive  definite  and  commutes  witii  Ry. 

3.  TIME-FREQUENCY  FUNDAMENTALS 

Next,  we  briefly  review  some  TF  representations  and  con¬ 
cepts  that  will  be  used  in  subsequent  sections. 

•  The  Weyl  symbol  (WS)  [8-11]  of  a  linear  operator  (lin¬ 
ear  time-varying  system)  H  with  kernel  (impulse  response) 
h(t\,  t2)  is  defined  as 

LH(t,f)  =  j  h(t  +  -,t  -  -)  e  j2*fTdr . 

For  an  underspread  system  H  (see  below),  Ln(t,  f)  can  be 
viewed  as  a  time- varying  transfer  function  [10,11]. 

•  The  spreading  function  (SF)  [8,10,11]  of  a  linear  operator 
(linear  time-varying  system)  H, 

Sh(t,u)  4  J"  h{t+T-,t-T-)e-^dt, 

describes  the  distribution  of  time  shifts  by  r  and  frequency 
shifts  by  v  effected  by  H. 

•  The  Wigner-Ville  spectrum  (WVS)  [12-15]  of  nonstation¬ 
ary  random  processes  x(t),  y(t)  is  defined  as 

Wx,y(tJ)  =  LK.Jt,f),  Wa{t,f)±W.0(t,f)€R. 

For  jointly  underspread  processes  x(t),  y(t)  (see  below),  it 
can  be  interpreted  as  a  time- varying  (cross)  power  spectrum. 

•  Another  time- varying  power  spectrum  is  the  physical  spec¬ 
trum  [12-16] 

&.,(*,/)  =  Wx,y(t,f)  **Wg(-t,~f) 

Sx(t,f )  4  Sx<x(t,f)>  0, 

where  **  denotes  2-D  convolution,  g(t)  is  an  analysis  window 
(normalized  such  that  ||3||2  =  1),  and  Wg{t,  f )  is  the  Wigner 
distribution  [12,17]  of  g(t). 

•  The  expected  ambiguity  function  [14, 18] 

AXiV(t,v)  =  Srx<v(t,u),  Ax{t,v)  =  Ax,x(t,v) 

describes  the  statistical  correlation  of  process  components 
separated  in  time  by  r  and  in  frequency  by  v. 

•  A  system/operator  H  is  underspread  if  its  SF  Sh(t,  v) 
is  supported  within  a  rectangular  region  Q  =  [—rg,  rg]  x 
[—ug ,  vg)  of  area  og  =  Argvg  <  1  [5,10,11].  This  means  that 
H  introduces  only  small  TF  shifts.  Similarly,  a  process  x(t) 
is  underspread  if  its  expected  ambiguity  function  Ax(r,u) 
is  supported  within  a  rectangular  region  Q  of  area  erg 

1  [14, 18].  This  means  that  x(t)  features  only  limited  TF 
correlations.  Two  processes  x(t),y(t)  are  jointly  underspread 
if  Ax(t,  u),  Av(t,  v),  and  Ax,v(t,  v)  are  supported  within  the 
same  rectangular  region  of  area  erg  • Cl. 

4.  A  TIME-FREQUENCY  COHERENCE 
FUNCTION 

We  are  now  ready  to  study  a  simple  and  intuitively  appealing 
TF  formulation  of  the  coherence  operator  Tr,B  that  avoids 
operator  inversions.  A  TF  coherence  function  based  on  the 
WVS  was  defined  in  [4]  as 


r*,!/(t,  /) 


WX,y(t,f) 

1 Wx(t,f)Wy(t,f ) 


(t,f)  6  TZ, 


where  TZ  is  the  TF  region  on  which  Wx  (f ,  /)  >  0  and  Wy  ( t,f ) 
>  0.  Fciy(t,  /)  is  a  complex- valued  function  that  is  covariant 
to  TF  shifts  (see  Section  5)  as  well  as  to  TF  scalings  and 
other  metaplectic  transformations  of  x{i),  y(t).  For  x(t),y(t) 
uncorrelated,  there  is  TXlV(t,  /)  =  0  on  TZ. 

r x,y{t,f)  as  approximate  TF  formulation  of  IT,y. 
We  now  show  that  for  x(t),  y(t)  jointly  underspread,  F x,y(t,  f) 
approximates  the  WS  of  the  coherence  operator  Tx,v.  We 
start  by  noting  that  can  be  alternatively  defined  by 


—  TJ1/2 


HicTa^yHy  —  R* 
-.A  TT  _  TJ1/2  ( 


with  H*  =  RT  and  H„  =  R,/  .  Our  central  assumption 
will  be  that  Sh.(t,  v),  Sny  (t,  v'),  and  Ax,y(r,v)  are  sup¬ 
ported  within  the  same  rectangular  region  Q  =  [-rg,rg]  x 
[—ug,t/g]  of  area  erg  =  Argvg. 

We  can  split  the  coherence  operator  r^y  into  a  part  TXtV 

whose  SF  is  supported  within  Q  and  a  part  FfiV  whose  SF 
is  supported  outside  Q.  This  is  motivated  by  the  desire  of 
approximating  by  T^y  in  the  sense  that  replacing  I^.y 
by  rj(y  does  not  greatly  affect  the  validity  of  (5): 

HaTs.yHy  =  R*,y  =>  HxTlyHy  «  RI|V  .  (6) 

Indeed,  we  have  the  following  result. 

Theorem  1  [Si.  Under  the  assumption  stated  above ,  the 
difference  H.xTXtyHy  —  Ri,,  is  bounded  as2 


l|H»r,fltfH,  R-x,y||2  o  , — - 
l|Hx||2||r,,y||2||Hy||2  -  ^ 

Hence,  if  ag  «  1,  i.e.,  if  x(t)  and  y(t)  are  jointly  under¬ 
spread,  the  approximation  in  (6)  is  indeed  valid. 

We  now  pass  to  the  TF  domain  using  the  WS. 
Theorem  2  [5].  Under  the  assumption  stated  above,  the  dif¬ 
ference  Ai(f,  /)  =  Lh.  ( t ,  f)Lre^(t,  f)  Lh„  (t,  f)—Wx,y{t,  f) 
is  bounded  as3 

|Ai(f,/)|  <  3tt  2  q 

II-5h.II!  j|5r»,y||00  IISHyll!  2  e  Q 

Hence,  for  erg  1  one  has 

LnAt,f)LTQjt,f)Lnv(t,f)  »  Wx,y(tJ) .  (7) 

We  now  insert  the  approximations  Lh.  (t,  f)  ~  \Jwx(t,f) 
and  Lh y  ( t ,  /)  w  \Jwy{t ,  /)  valid  for  underspread  x(t)  and 
for  underspread  y(t)  [5]  and  divide  by  \Jwx(t,  f)Wy(t,  f)  on 
TZ.  Equation  (7)  thus  becomes 


<  ~2  as  +  • 


Lrsjt,f)  «  r,lV(t,/),  (t,f)e  TZ,  (8) 

where  r*,y(t,  /)  is  the  TF  coherence  function  in  (4).  Further¬ 
more,  it  can  be  shown  [5]  that  Lrg^(t,f)  equals  Lr„,v  {t,  f) 

2 Here,  ||H||2  4  [J^,  JZo  \h(tut2)\2 dtldt2)1/2 . 

3 We  note  that  HShIU  =  IZofZo  ISh(t. v)\drdv  and 
II-ShIIoo  =supTj„|5H(r,i/)|. 
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convolved  by  a  function  ip(t,  f)  whose  2-D  Fourier  trans¬ 
form  is  1  in  Q  and  0  outside  Q.  For  ag  small,  ip(t,  f)  is  a 
smooth  function  and  thus  Lre^(t,f)  is  a  smoothed  version 
of  Lrx  y(t,  /).  Hence,  (8)  states  that  for  jointly  underspread 
x(t),  y(t),  the  TF  coherence  function  F, ,,y(t,f)  is  approx¬ 
imately  equal  to  a  smoothed  version  of  the  WS  of  the  co¬ 
herence  operator  riiy.  In  this  sense,  Tx^y(t,f)  provides  an 
approximate  TF  formulation  of  the  coherence  operator  TXtV. 

Bounds.  In  Section  5,  it  will  be  shown  that  the  alter¬ 
native  TF  coherence  function 

«.,(*,/)  =  7  hiML  (9) 

y  Sx(t,  f)  Sy(t,  f) 

satisfies  the  bound  |l£iy(t,  /) |2  <  1.  For  x(t)  and  y(t)  jointly 
underspread,  the  WVS  are  approximately  equal  to  the  cor¬ 
responding  physical  spectra  [5,14,18],  and  thus  rx<y(t,f)  a 
equivalently,  Sx(t,  f)  Sy(t,  f)  \Wx,y(t,  f)\2  w 

Wx(t,f)Wy(t,f)  /)|2.  The  last  approximation  is  sup¬ 

ported  by  the  following  result. 

Theorem  3  [5].  Let  Ax(t,v),  Av(t,v),  and  Ax,v(t,v)  be 
supported  within  the  same  rectangle  Q  =  [— Tg,  Tg ]  x  [—vg,  vg]. 
Then,  the  difference  A2(t,  f)  =  Sx{t,  f)  Sy(t,  f)  \Wx,y(t,f)\2 
—  Wx(t,f)Wy(t,f)  |5ZlV(t,  f)\2  is  bounded  as 

— _ <  4e 

P*lli  Pv!l,  P*.  J?  "  ’ 

where  e  =  max(r:„)65  |l  —  Ag(r,v)  |  with  A9(t,v)  the  ambi¬ 
guity  function  [12,  17]  of  the  analysis  window  g(t)  used  in 
the  physical  spectrum. 

Since  Ag(r,i /)  ss  1  for  small  (r,  v),  t  will  be  small  for 
small  Q  and  thus,  still  for  small  Q, 

\r*,y(t,f)\2  a  |r;y(t,/)|2.  (10) 

With  \K,y(t,f)\2  <  1,  (10)  implies  that  for  x(t),y(t)  jointly 
underspread,  \Tx,v(t,f) |2  is  approximately  bounded  by  1. 

However,  for  x(t),y(t)  not  underspread,  |rXl!,(t,  /)|2  may 
be  arbitrarily  large.  Consider  for  example  the  two  corre¬ 
lated  random  processes  x(t)  =  ft  u(t+to)  e~j2*fot  and  y(t)  = 

f)  u(t  -  to)  ej2lT !ot  where  u(t)  =  e~*t2/T2/\/W,  to  and  f0  are 
fixed,  and  0  is  random  with  E{|/3|2  }  =  7  >  0.  One  obtains 

Wx,y{t,f)  =  27  e~27r^  /T  +f  T  1  cos  (47r(/o<  -  tof  +  tofo)) 
Wx(t,f)  =  7  e~2nKt+to)2 /t2+U +/o)2t2] 

Wy(t,f)  =  7e-2-t(‘-‘o)2/^+(/-/0)=r=i 


cross  terms”  in  Wx,y(t,  f)  [14].  We  note  that  for  large  to,  f0, 
the  processes  x(t)  and  y(t)  are  not  jointly  underspread. 

Completely  coherent  processes.  We  next  consider 
the  case  of  linearly  related  processes  y(t)  —  (Kx)(t),  where 
we  would  like  to  have 

|r*,v(t,/)|2«i,  (t,f)e  n  (li) 

or,  equivalently,  \Wx,y(t,  /)|2  nsWx(t,  f)Wv(t,  /). 

Theorem  4  [5].  Let  Sk(t,  v)  and  Ax(t,v)  be  supported 
within  the  same  rectangle  Q  =  [—rg,Tg]  x  [—vg ,  vg]  of  area 
erg  =  4rgug.  Then,  the  difference  A s{t, })  =  |H4,j,(t,/)|2  — 
Wx(t,f)Wy(t,f)  is  bounded  as 

|^3(t,/)|  <  1 1 7T 

||Az||2||5k||2  ~2a°- 

Hence,  for  small  ag,  \Wx,y{t,f)\2  «  Wx(t,  f)  Wv(t,  f)  and 
the  approximation  (11)  is  indeed  valid.  Small  erg  implies 
that  x(t)  and  K  are  jointly  underspread;  in  this  case,  y(t)  = 
(K®)(f)  will  be  underspread  as  well.  An  example  where  K  is 
not  underspread  and  thus  (11)  is  not  valid  was  given  further 
above.  Indeed,  the  processes  x(t)  =  ft  u(t  +  to)  e~j2w^ot  and 
y(t)  —  (iu{t  —  to)  ej2l'!at  defined  above  are  related  as  y(t)  = 
(K x)(t),  where  K  is  a  TF  shift  operator  which  for  large  to,  fo 
is  not  underspread. 

Linearly  distorted  processes.  For  a(t)  =  (Hia:)(t) 
and  b(t)  =  (H 2y)(t),  we  would  like  to  have  the  (approxi¬ 
mate)  invariance 

|r<M>(<,/)|2 «  |rz,„(f,/)|2,  (t,  f)  €  1Z,  (12) 

which  equivalently  requires  \Wa,b(t,  f)\2Wx(t,  f)Wy(t,  f)  « 
\Wx,y(t,f)\2Wa(t,f)Wb(t,f). 

Theorem  5  [5].  Let  Shi(t,  v),  Sh3(t,v),  Ax(r,  v), 
Av(r,u),  and  Ax,v(t,v)  be  supported  within  the  same  rect¬ 
angle  g  =  [-Tg,  Tg]  x  [-Vg,  vg]  of  area  ag  =  4 Tgvg.  Then, 

the  difference  A4(t,_/)  4  \Wa,b(t,  f)\2Wx(t,  f)Wv(t,  f)  - 
(<,  /)|2  Wa(t,  f)Wb(t,  f)  is  bounded  as 

_ [A4(f, /)[ _  ^  97 r 

ll^llill^llx  11^,11?  IISh.  II?  fcll?  ~  ~2ae' 

Hence,  for  small  ag,  (12)  is  indeed  valid,  which  means 
that  |Fc,y(t,  f)\2  is  approximately  invariant  to  linear  pro¬ 
cess  transformations.  Small  ag  implies  that  the  processes 
x(t)  and  y(t)  and  the  operators  Hi  and  H2  are  all  jointly 
underspread;  this  implies  in  turn  that  a(t)  =  (Hii)(<)  and 
b(t)  =  (H2j/)(t)  are  jointly  underspread  processes  as  well. 


It  is  seen  that  Wx(t,f)  and  Wy(t,f)  are  localized  about 
(—to,—fo)  and  (to,  fo),  respectively.  However,  Wx,y(t,f)  is 
localized  (and  oscillatory)  about  (0,0),  corresponding  to  a 
“statistical  cross  term”  [14].  It  follows  that 

lrz,»(0,  0)|  =  2e27r^°/T  +f°T  1  |cos(47rt0/o)| , 

which  for  increasing  to,  fo  can  become  arbitrarily  large. 
This  refutes  a  previous  incorrect  claim  that  \Tx,y(t,f)\  is 
bounded  by  1  [4].  Furthermore,  we  see  that  the  large  val¬ 
ues  of  |Fr>!((t,  /)|  are  due  to  TF  correlations  [5,14,18],  i.e., 
correlations  between  components  of  x(t)  and  y(t)  located  in 
different  parts  of  the  TF  plane,  which  give  rise  to  “statistical 


5.  SHIFT-COVARIANT  TIME-FREQUENCY 
COHERENCE  FUNCTIONS 


A  generalization  of  Tx<y(t,  /)  is  given  by 


r tlit,/)  4 


(t,f)  g  n, 


where 


y/p^[t,f)Pic\t,f)  ’ 

a  OO 

rx,y(ti  T  t,  t2  +  t)  c* (t\,t2) 

■OO 


-OO  «/  —OO 


dtidt2 


(13) 
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is  a  TF  shift  covariant  time-varying  power  spectrum  [5, 12- 
15]  and  TZ  is  the  TF  region  on  which  Pjc)  (t,  /)  =  Px'.l  (t,  f) 
>  0  and  Pyc\t, })  >  0.  We  assume  that  the  kernel  function 
c(ti,  t2)  in  (13)  satisfies  c*(t2,t i)  =  c(h,t2)  so  that  Pic)(t,/) 
is  real- valued.  Two  important  special  cases  of  ri^(t,  /)  are 
F x,y(t,f)  in  (4)  (obtained  with  P£y(t,  /)  =  Wx,y(t,f)  or 
c(ii,t2)  =  H*1^))  and  F £,„(*,/)  in  (9)  (obtained  with 

=  Sx,y(t,f)  or  c(ti,t2)  =  g(h)g*(t2)). 

The  TF  coherence  function  rify  (i,  /)  is  a  complex- valued 
function  that  is  parameterized  by  c(ti, f2)  or,  equivalently, 
by  the  (self-adjoint)  linear  operator  C  with  kernel  c(ti,t2). 
For  fixed  c(ti,t2),  F x°y(t,f)  is  TF  shift  covariant,  i.e., 

ri!i(t,/)  =  r i%(t-T,f-v) 

with  x(t)  =  x(t  -  r)ej2*vt,  y(t)  =  y(t  -  r)ei2lrvt.  For 
x(t),  y(t)  uncorrelated,  there  is  Tx^y{t,  /)  =  0  on  TZ. 
Theorem  6  [5].  There  is 

|ri^(t,/)|2<i,  (t,f)  e  tz 

for  all  x(t),y{t)  and  with  nonempty  TZ  iff  the  operator  C 
underlying  rj^(f,/)  is  positive  semidefinite.4 

Indeed,  if  C  is  positive  semidefinite,  then  Pi°y(t,/)  is 
a  smoothed  version  of  Wx,y(t,  f)  [14];  _this  smoothing  sup¬ 
presses  the  statistical  cross  terms  of  Wx,y(t,f)  present  in 
the  overspread  case  and  thus  allows  ri^(t,  /)  to  be  properly 
bounded  even  for  overspread  processes. 

The  operator  C  underlying  T^y(t,f)  in  (9)  is  positive 
semidefinite,  and  thus  |F[iV(t,/)|2  <  1-  On  the  other  hand, 
the  operator  C  underlying  Fc,y(t, /)  is  not  positive  semi¬ 
definite,  and  indeed  we  have  observed  in  Section  4  that 
|F*,y(t,/)|2  can  be  arbitrarily  large  (however,  we  recall  that 
it  is  approximately  bounded  by  1  in  the  underspread  case). 

6.  SIMULATION  RESULTS 

Experiment  1.  We  analyze  the  coherence  of  the  input  x(t) 
and  the  noise-contaminated  output  y(t)  =  (K.t)(<)  +n(t)  of 
a  time- varying  linear  system  K.  The  input  x(t)  is  stationary 
and  white  with  correlation  R*  =  I  (corresponding  to  con¬ 
stant  WVS  Wx{t,f)  =  1).  The  noise  n{t)  is  stationary  and 
white  with  correlation  Rn  =  r)  I  (corresponding  to  constant 
WVS  Wn(t,f)  =  rj)  and  uncorrelated  with  x(t).  The  WS 
and  SF  of  K  are  depicted  in  Figs.  1(a)  and  (b),  respectively. 
The  SF  of  K  shows  that  K  is  underspread. 

In  the  noise-free  case  (r/  =  0),  x(t)  and  y(t)  are  com¬ 
pletely  coherent,  i.e.,  rx,vT^y  =  TXtyTx,y  =  I-  Since  x(t)  and 
K  are  underspread,  Theorem  4  applies  and  we  can  expect 
that  |rx,y(t,/)|2  «  1.  Indeed,  we  found  that  the  maximum 
deviation  of  |Fc,y(t,/)|2  from  1  was  0.028. 

For  r]  >  0,  the  noise  causes  a  reduction  of  coherence 
that  depends  on  the  output  SNR.  Since  the  output  SNR 
is  TF-dependent  (due  to  the  TF  weighting  characteristic  of 
K  as  shown  in  Fig.  1(a)),  the  coherence  reduction  is  TF- 
dependent  as  well.  This  is  clearly  indicated  by  the  WS  of 

4We  recall  that  a  positive  semidefinite  operator  C  is  defined 
by  the  condition  (Ci,i)  >  0  for  all  x(t)  [6].  For  C  positive 
semidefinite,  there  is  Pxc\t,  f)  >  0  for  all  (t,  /)  and  for  all  x(t). 


Figure  1:  Simulation  results  for  Experiment  1:  (a)  WS  of 
K;  (b)  magnitude  of  SF  of  K;  (c)  magnitude  of  WS  of  Tx,y; 
(d)  magnitude  ofTx,y(t,f).  The  rectangle  in  (b)  has  area  1 
and  allows  to  assess  the  underspread  property  of  K.  Time 
duration  is  256  samples;  normalized  frequency  ranges  from 
-1/4  to  1/4. 


Tly  and  the  TF  coherence  function  rx,y(t,  f)  shown  in  Figs. 
1(c)  and  (d),  respectively.  Moreover,  the  similarity  of  these 
two  results  confirms  the  validity  of  the  approximation  (8). 

Experiment  2.  Again,  y(t)  =  (K x)(t)  +  n(t)  with  x(t) 
and  K  as  in  the  previous  example.  However,  n(t)  now  is 
nonstationary  narrowband  noise  with  WVS  as  shown  in  Fig. 
2(a).  From  the  expected  ambiguity  function  of  n(t)  shown 
in  Fig.  2(b),  it  is  seen  that  n(t)  is  reasonably  underspread. 
The  Weyl  symbol  of  T^y  and  the  TF  coherence  function 
r*,y(t,  /),  shown  respectively  in  Figs.  2(c)  and  (d),  are  again 
seen  to  be  practically  identical.  In  this  example,  significant 
coherence  reduction  occurs  only  in  the  TF  support  region  of 
the  noise;  in  the  remainder  of  the  TF  plane  there  is  complete 
coherence,  thus  indicating  a  pure  linear  relation  between 
those  components  of  x(t)  and  y(t)  that  are  located  in  this 
“noise-free”  TF  region.  Again,  both  Lre^  (t,  /)  and  TXtV(t,  /) 
clearly  indicate  the  TF  dependence  of  coherence. 

Experiment  3.  We  finally  analyze  the  coherence  of 
pressure  signals  x{t )  measured  inside  the  cylinder  of  a  com¬ 
bustion  engine  and  vibration  signals  y(t)  measured  on  the 
engine  block.5  The  goal  is  to  see  whether  the  pressure  and 
vibration  processes  are  linearly  related  (as  assumed  in  [19]). 
Both  x{t)  and  y(t)  consist  of  several  resonances  with  de¬ 
creasing  resonance  frequencies.  Estimates  Fr,y(f,  /)  of  the 
TF  coherence  function  ra;,y(t,/)  are  shown  in  Fig.  3  for  two 
different  engine  speeds.  (These  estimates  were  computed  us¬ 
ing  estimated  Wigner-Ville  spectra  [20]  obtained  from  mul¬ 
tiple  realizations.)  For  both  engine  speeds,  |r*,y(t,/)|  is 
seen  to  be  significantly  larger  than  zero  in  the  TF  support 
regions  of  the  resonances.  Specifically,  in  the  TF  region  of 

5 We  are  grateful  to  S.  Carstens-Behrens,  M.  Wagner,  and  J. 
F.  Bohme  for  providing  us  with  the  car  engine  data  (courtesy  of 
Aral-Forschung,  Bochum). 
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Figure  2:  Simulation  results  for  Experiment  2:  (a)  WVS  of 
n(t);  (b)  magnitude  of  expected  ambiguity  function  of  n(t); 
(c)  magnitude  of  WS  of  Ffiy;  (d)  magnitude  ofrx<y(t,f). 
The  rectangle  in  (b)  has  area  1  and  allows  to  assess  the  un¬ 
derspread  property  of  n(t).  Time  duration  is  256  samples; 
normalized  frequency  ranges  from  —1/4  to  1/4. 


the  first  resonance  the  maximum  of  |fa;,y(t, /)|  is  about  0.9, 
which  clearly  indicates  a  linear  relationship.  For  the  second 
and  third  resonance,  the  maximum  of  |ra,,y (t,  /) |  is  about 
0.7  and  0.4,  respectively.  This  still  suggests  a  linear  rela¬ 
tionship,  though  apparently  contaminated  by  measurement 
noise  and  extraneous  interference. 

7.  CONCLUSIONS 

We  introduced  and  studied  a  coherence  operator  and  time- 
frequency  (TF)  coherence  functions  for  nonstationary  co¬ 
herence  analysis.  We  showed  that  for  jointly  underspread 
nonstationary  processes,  TF  coherence  functions  are  mean¬ 
ingful  tools  for  nonstationary  coherence  analysis.  However, 
if  the  processes  are  not  jointly  underspread  underspread, 
meaningful  results  can  only  be  obtained  with  TF  coherence 
functions  based  on  smoothed  time-varying  spectra.  We  note 
that  TF  coherence  functions  can  be  estimated  based  on  es¬ 
timates  of  the  time-varying  spectra  involved  [4, 12, 13,  20]. 
Furthermore,  many  of  the  theorems  presented  can  be  ex¬ 
tended  to  a  generalized  underspread  concept  that  does  not 
require  exact  compact  support  of  spreading  functions  and 
expected  ambiguity  functions  [5,10,14], 
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ABSTRACT 

An  adaptive  approach  to  the  estimation  of  the  instan¬ 
taneous  frequency  (IF)  of  non-stationary  mono-  and  multi- 
component  FM  signals  with  additive  Gaussian  noise  is  pre¬ 
sented.  It  is  shown  that  the  bias  and  variance  of  the  IF 
estimate  are  functions  of  the  lag  window  length.  If  there 
is  a  bias-variance  tradeoff,  then  the  optimal  window  length 
for  this  tradeoff  depends  on  the  unknown  IF  law.  Hence 
an  adaptive  algorithm  with  a  time-varying  and  data-driven 
window  length  is  needed.  The  adaptive  algorithm  can  uti¬ 
lize  any  quadratic  time-frequency  distribution  that  satisfies 
certain  conditions.  A  quadratic  distribution  that  is  most 
suitable  for  this  approach  is  proposed.  The  algorithm  esti¬ 
mates  multiple  IF  laws  by  using  a  tracking  algorithm  for  the 
signal  components  and  utilizing  the  property  that  the  pro¬ 
posed  distribution  enables  non-par ametric  component  am¬ 
plitudes  estimation.  An  extension  of  the  proposed  TFD 
consisting  in  the  use  of  time-only  kernels  for  adaptive  IF 
estimation  is  also  proposed. 

1.  INTRODUCTION 

The  problem  of  non-parametric  instantaneous  frequency  (IF) 
estimation  for  multi-component  non-stationary  signals  is  an 
important  and  unresolved  issue  in  signal  processing.  Time- 
frequency  analysis  techniques  are  generally  used  as  they 
reveal  the  multi-component  nature  of  such  signals. 

The  concept  of  instantaneous  frequency  and  methods 
of  IF  estimation  were  reviewed  in  [1]  and  [2].  An  efficient 
adaptive  algorithm  for  IF  estimation  using  the  Wigner-Ville 
distribution  (WVD)  was  presented  in  [3]  and  [4].  This  pa¬ 
per  aims  to  develop  a  general  adaptive  method  for  IF  esti¬ 
mation  of  mono-  and  multi-component  signals  in  additive 
Gaussian  noise  that  is  suitable  for  quadratic  time-frequency 
distributions.  We  found  that,  to  be  used  for  this  purpose, 
the  quadratic  TFDs  must  satisfy  three  conditions: 

•  first,  the  variance  of  the  IF  estimate  using  a  TFD  p(t,  f ) 
should  be  a  continuously  decreasing  function  of  the  lag  win¬ 
dow  length  while  the  bias  is  continuously  increasing,  so 
that  the  algorithm  will  converge  at  the  optimal  lag  win¬ 
dow  length  that  resolves  this  bias-variance  tradeoff. 

•  second,  since  we  introduce  an  adaptive  window  length  in 
the  lag  direction,  the  kernel  of  p(t,  f)  should  not  have  a 
narrow  passband  in  the  lag  direction  which  would  limit  the 
effective  length  of  the  adaptive  lag  window. 

•  third,  p(£,  /)  should  have  a  high  time-frequency  resolu¬ 
tion  while  suppressing  cross-terms  efficiently  so  as  to  give 


a  robust  IF  estimate  for  mono-  and  multi-component  FM 
signals. 

In  this  analysis  we  propose  a  distribution  d(t,  f )  that  is 
most  suitable  for  the  adaptive  IF  algorithm  in  the  sense  that 
it  has  high  resolution,  effective  cross-terms  reduction,  and  a 
kernel  that  does  not  perform  filtering  in  the  lag  direction;  in 
addition,  it  enables  non-parametric  amplitude  estimation. 


2.  A  HIGH-RESOLUTION  TFD 
2.1.  The  Time-Lag  Kernel 

Recently  a  time-frequency  distribution  B(t,  f )  was  proposed 
and  shown  to  be  superior  to  other  fixed-kernel  TFDs  in 
terms  of  cross-terms  reduction  and  resolution  enhancement 
[5].  We  have  used  this  distribution  for  IF  estimation  for 
mono-  and  multi-component  FM  signals  [6].  However,  no 
direct  component  amplitudes  estimation  is  possible  from 
B(t,  f)  or  other  quadratic  TFDs,  a  difficulty  that  appears 
in  the  case  of  adaptive  IF  estimation  of  multi-component 
signals.  Based  on  B(t,  f)  and  the  conditions  of  the  adaptive 
algorithm,  the  kernel  of  the  proposed  distribution  d(t,  f )  in 
the  time-lag  domain  is  given  by 


G(£,r)  =  GQ(£)  = 


cosh2a  (f) 


(1) 


where  a  is  a  real  positive  number  and  ka  =  r(2a)/(22a_1 
r2(a)),  T  stands  for  the  gamma  function.  Filtering  in  the 
r  direction  is  performed  by  introducing  a  window  function 
(see  section  III). 


2.2.  Properties  of  the  Proposed  Distribution 

Most  of  the  desirable  properties  of  time-frequency  distribu¬ 
tions  explained  in  [1]  and  [2]  are  satisfied  by  this  kernel  as 
stated  below. 

•  Realness,  time-shift  and  frequency  shift  invariance,  fre¬ 
quency  marginal  and  group  delay,  and  the  frequency  sup¬ 
port  properties  are  satisfied.  The  time  support  property  is 
not  strictly  satisfied,  but  it  is  approximately  true. 

•  Reduced  interference  and  resolution:  This  property  is  sat¬ 
isfied  by  d(t,  /).  First  we  consider  the  sum  of  two  complex 
sinusoidal  signals  z(t)  =  zi(£)  +  22(f)  =  aiei(’i7r!lt+ei'>  + 
a2e3(2”f2t+02)  where  ai,  02,  9i  and  82  are  constants.  The 
time-frequency  distribution  d(t,  f)  of  the  signal  z(t)  is  ob- 
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tained  as  [1] 

d(t,f)  =  aU(f-fi)  +  a22  6(f-f2) 

+2aia2la(t)  S(f  -  ^±k)  (2) 

where 

>«)  -  /a>)  r  «.(*«.  -  /,>. + »,  - «,) 

It  is  clear  that  the  cross-terms  are  oscillatory  in  time  and 
depend  on  the  frequency  separation  between  signal  com¬ 
ponents.  If  fi  and  f2  are  well  separated  then  the  term 
|  T(a  4-  jn(fi  —  f2))  |2  can  be  substantially  reduced,  while 
T2(a)  can  be  made  high  if  a  is  small.  As  a  -*  oo,  we  have 
G(t,  r)  — >  S(t)  and  d(t,  f)  would  approach  the  Wigner-Ville 
distribution  W(t,f). 

For  FM  signals,  d(t,  f)  performs  well  in  reducing  interfer¬ 
ence  (cross-terms)  while  keeping  high  resolution.  Figure  1 
shows  a  comparison  between  the  discrete  versions  of  d(t,  f) 
and  the  Choi-Williams  distribution  CW(t,f)  using  a  two- 
component  linear  FM  signal.  Changing  the  parameters  a 
and  a  will  improve  one  of  the  above  two  requirements  at 
the  expense  of  the  other. 

•  Time  marginal  and  instantaneous  frequency:  Now  we  con¬ 
sider  the  important  property  of  the  instantaneous  frequency 
of  a  time- varying  signal  s{t).  The  instantaneous  frequency 
fi(t)  is  defined  as  fi(t)  =  where  z(t)  = 

is  the  analytic  signal  associated  with  s(t)  [1].  Tradition¬ 
ally  a  time-frequency  distribution  p(t,  f)  is  looked  upon  as 
analogous  to  a  probability  distribution,  hence  it  was  im¬ 
posed  that  the  first  moment  of  p{t,f)  with  respect  to  f 
must  equal  the  instantaneous  frequency  /*(£),  leading  to 
the  conditions  g{ v,0)  =  1  Vv  and  dg{v,T)/dr  |T=o=  0  Vi/, 

where  g[v,r)  =  (e(t,r))  [1],  But  the  spectrogram, 

Page,  and  Rihaczek  distributions,  for  example,  do  not  sat¬ 
isfy  these  conditions  [1],  If  the  time-frequency  distribu¬ 
tion  does  not  satisfy  one  of  the  marginals,  the  analogy  with 
a  probability  distribution  breaks  down  and  the  traditional 
reasoning  for  the  IF  property  is  no  longer  valid.  Hence  we 
postulate  the  following  general  IF  property:  at  any  time  t, 
the  time-frequency  distribution  p(t, })  should  have  absolute 
maximum  at  /  =  ~  *p£p ,  which  is  the  actual  important 
characteristic  needed  for  IF  estimation. 

In  fact,  d(t,f)  does  not  satisfy  the  time  marginal,  hence 
does  not  satisfy  the  traditional  condition  for  the  instanta¬ 
neous  frequency.  But  we  shall  prove  that  at  any  t,  d(t,f ) 
has  an  absolute  maximum  at  /  =  ^  pjp  for  linear  FM. 
This  is  the  basis  for  our  IF  estimate.  For  non-linear  FM 
signals  this  estimate  is  biased.  This  bias  would  be  the  basis 
of  the  adaptive  IF  estimation  developed  in  section  III. 

□  Proof:  For  an  FM  signal  of  the  form  z(t)  =  a  we 

can  express  d(t,  f)  as  [1] 


OO  oo 


d(t,f)  = 


\a\2  [  f  e~^fTG(t-u,T) 

*  —  OO  •'-OO 

x  ejWu+T/2>-*<u-T/2Wr 

a  |2  [  [  e-***tTG(t-u,r) 

OO  —  oo 

x  e M<t>  M+T,tL3{hoii)f^rs:<t>{k)(^)\dudT 


Figure  1:  Performance  comparison  between  d(t,  f)  for  a  = 
0.1  and  CW(t,  f)  for  a  =  11  using  a  two-component  linear 
FM  signal  at  the  discrete  time  instant  n  =  30.  Total  signal 
length  is  N  —  64  and  the  sampling  interval  is  T  =  1 . 


using  Taylor  series  expansion.  Assuming  a  relatively  small 
effect  from  higher-order  derivatives  0(fc)(f),  jfc  >  3,  we  have 

/oo 

G a(t  -  u)  5  [~~(p  ( u )  -  f]du 

—  oo 

=  |a|2G a(t  -  i>(f))p if)  (3) 

where  ip  is  the  inverse  of  p<j>  ,  i.e.,  p4>  ii>{f))  —  f.  As¬ 
suming  that  %p  (/)  is  not  a  highly  peaked  function  of  /  and 
knowing  that  G„  (t  -  ?/>(/))  is  peaked  at  t  =  i />(/),  the  abso¬ 
lute  maximum  of  d(t,  /)  for  any  time  t  would  be  at  ip(f)  =  t, 
or  /  =  p<p  (l),  which  is  the  instantaneous  frequency  of 
the  FM  signal  z(t).  For  non-linear  FM  signals,  the  energy 
peak  of  d(t,  f)  is  actually  biased  from  the  instantaneous  fre¬ 
quency  because  of  the  extra  term  YlkL  3(todd  )rf£^W(tO- 
The  major  contribution  in  this  term  is  due  to  0(3)(u)  (see 
section  III).  Therefore  at  the  instants  of  rapid  change  in  the 
IF  law  the  bias  is  not  negligible  and  eq.(3)  would  not  be  an 
accurate  approximation  to  d(t,  f)  unless  suitable  window¬ 
ing  in  the  lag  direction  is  used. 

For  linear  FM  signals  we  have  <p^k\t)  =  0  for  k  >  3.  Assum¬ 
ing  z(t)  =  aei^Got+fy-t  ) ,  w}jere  yo  an(j  ^  are  constants, 
we  have 

d{t,f)  =  jo\*\2Ga(t-±U-f°))  (4) 

which  has  an  absolute  maximum  at  /  =  f0+(30t,  the  instan¬ 
taneous  frequency  of  the  linear  FM  signal  z(t).  As  /?„  —t  0, 
i.e.,  z(t)  approaches  a  sinusoid,  we  have  d(t,  f)  — v  |  a  )2 
6{f  —  f0),  in  accordance  with  eq.(2). 

In  practical  implementation  a  window  w(t)  is  used  in  the 
r  direction  and  the  results  in  eqs.(3)  and  (4)  are  convolved 
with  the  Fourier  transform  of  w(r).  □ 

In  the  next  section  we  will  present  an  adaptive  approach 
to  the  IF  estimation  for  FM  signals  using  quadratic  time- 
frequency  distributions. 

3.  IF  ESTIMATION  USING  QUADRATIC  TFDs 
3.1.  Introduction  to  IF  Estimation 
We  consider  an  analytic  signal  z(t)  of  the  form 

z(t)  =  ae^(<)  +  e(t) 


560 


where  the  amplitude  a  is  constant,  and  e(t)  is  a  complex¬ 
valued  white  Gaussian  noise  with  independent  identically 
distributed  (i.i.d.)  real  and  imaginary  parts  with  total  vari¬ 
ance  .  The  instantaneous  frequency  of  z(t)  is  given  by 
[1] 


m  = 


i  d<pit) 

2tt  dt 


(5) 


We  assume  in  this  analysis  that  fi(t)  is  an  arbitrary, 
smooth  and  differentiable  function  of  time  with  bounded 
derivatives  of  all  orders.  The  general  equation  for  quadratic 
time-frequency  representation  of  a  signal  z(t)  is  given  by  [1] 

p(t,/)=  T[G{t,  T)*Kz(t,r)\ 

T-*f  (t) 


where  m  is  an  integer  and  T  is  the  sampling  interval.  If 
ph(t ,  /)  is  discretized  over  time  and  frequency  then  we  have 

N  j-1  N,-l 

ph(n,k)  =  Y1  Kz(lT,2mT)GeB(nT-lT,2mT) 

l=—Ns  m=—Ns 

x  (9) 

where  2 N,  is  the  number  of  samples. 

The  IF  estimate  is  a  solution  of  the  following  optimiza¬ 
tion 


fih(t)  =  arg[max  ph(t,  /)]  ;  0  <  /  <  f,/2  (10) 

where  f ,  =  1/T  is  the  sampling  frequency. 


where  G{t,r)  is  the  time-lag  kernel,  Kz(t,  r)  =  z(t+^)z*  (t— 
5)  and  *  denotes  time  convolution.  For  smoothing  and  lo- 

2  (t) 

calization  we  introduce  a  window  function  Wh(r)  =  %w(%) 
where  w(t)  is  a  real- valued  symmetric  window  with  unity 
length,  i.e.,  w(t)  =  0  for  1 1 1>  hence  the  window  length 
is  h. 

As  the  time-frequency  representation  is  now  dependent 
on  h,  we  denote  it  by  pn(t,  f)  which  is  given  by 

p{t,f)  =  T  [Gee(t,r)*Kz(t,T)\  (6) 

T-+/  (t) 


3.2.  Bias  and  Variance  of  the  IF  Estimate 

Following  the  same  analysis  as  in  [6],  the  estimation  bias  is 
found  to  be 

E[Afih  (t)]  =  ^  (U) 

and  the  variance  is 

v»(i/«(0)  =  ^p[l  +  ^p]|  (12) 


where  Ges  is  the  effective  time-lag  kernel  given  by 

Geff(t,r)=mh(^)G(t,r)  (7) 

The  lag  window  wk(^)  will  restrict  the  lag  function  of 
the  kernel  G(t,r)  to  the  interval  |  r  |<  h.  If  the  lag  func¬ 
tion  of  the  kernel  has  a  passband  narrower  than  that  of 
the  lag  window,  it  will  dominate  over  the  function  of  this 
window.  In  section  III  we  shall  prove  that  for  a  robust  IF 
estimation,  wh(j)  should  be  adaptive  with  variable  length 
h.  If  the  kernel  G(t,r)  already  has  a  factor  controlling 
the  lag  passband  independently  of  time,  it  may  be  better 
to  consider  adapting  this  factor  instead  of  introducing  an 
adaptive  window  in  the  lag  direction.  However,  this  would 
require  different  analysis  for  different  TFDs.  In  addition, 
there  is  the  problem  of  component  amplitude  estimation  in 
the  case  of  multi-component  signals.  On  the  other  hand, 
designing  new  time-lag  kernels  that  are  functions  of  time 
only  could  result  in  very  efficient  TFDs  like  d(t,f).  Such 
TFDs  are  more  suitable  for  adaptive  IF  estimation  as  they 
enable  non-parametric  amplitude  estimation.  Further  stud¬ 
ies  on  these  TFDs  would  be  attempted  in  future  works.  In 
this  paper  we  assume  that  the  parameters  of  the  TFD  are 
arranged  such  that  the  lag  passband  of  the  kernel  G(t,  t)  is 
larger  than  the  largest  lag  window  length  necessary  for  the 
adaptive  IF  estimation. 

In  the  discrete  lag  domain  ph{t,  f)  can  be  expressed  as 
follows 


ph(t,f) 


Kz  (u,  2 mT)GeB(t  —  u,  2 mT) 


x  e~ji*fmTdu  (8) 


where 

Afih(t)  =  ~  fih(t) 

/oo  °° 

y;  w h  {mT)  (2nmT)  2G(u,  2mT)du 

■°°  m——oo 

Lh{t)  =  Wh (mT)Acj>(u, mT) {2nmT)  (13) 

m=-oo 

x  G{t  —  u,  2mT)du 

/oo  °° 

Wh{mT)2(2irmT)2G{u,2mT)du 

■°°  m=— oo 


Equations  (11)-(13)  indicate  that  the  bias  and  the  vari¬ 
ance  of  the  estimate  depend  on  the  lag  window  length  h  for 
any  kernel  G(t,r).  To  see  how  the  bias  and  the  variance 
vary  with  h,  asymptotic  analysis  as  T  — t  0  is  necessary. 


3.3.  Asymptotic  Formulas  Using  d(t,  f) 

Following  the  same  analysis  as  in  [6],  we  have  the  following 
asymptotic  formulae  for  the  variance  and  the  bias  as  T  — > 
0  using  a  rectangular  lag  window 


vai(Afih(t))  = 


3  at 


2tt2  |  a  | 


r(l  + 


2  1  a  I2  J  ft3 


(14) 


and 


A (u)du 
cosh2“  (t  —  u ) 


(15) 


E{Afih(t))  <  ^ h 2  (16) 
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where  A (t)  =  +  n)  -f  fj2\t  -  n),  sup  |  |<  M2. 

t 

For  small  ft,  the  optimal  window  length  that  minimizes  the 
mean  squared  error  is  obtained  as 


h 


opt 


1800<t2T(1  +  5^7)  I 

I  a  I2  (/?’(*)  *  l/cosh2“(f)H 
1*) 


(17) 


Hence  the  optimal  window  length  depends  on  the  second 
derivative  of  the  instantaneous  frequency  /(2^(f),  which  is 
time  and  signal  dependent.  From  eqs.(21),  (25)  and  (26)  it 
is  clear  that  the  variance  and  bias  of  the  IF  estimate  using 
d(t,  f)  have  the  same  rates  of  change  with  respect  to  the 
window  length  ft  as  those  using  WVD  [3,  4]. 


3.4.  The  Adaptive  Algorithm  and  Its  Conditions  of 
Applicability 

For  d(t,f),  eqs.(14)  and  (15)  show  that  when  ft  increases 
the  bias  increases  and  the  variance  decreases.  From  eq.(17) 
it  can  be  seen  that  the  optimal  window  length  is  a  function 
of  time  and  depends  on  the  second  derivative  of  the  IF  law 
it  decreases  when  the  IF  law  fi(t)  has  a  high  varia¬ 
tion.  Hence  a  time-varying  window  length  is  needed  to  op¬ 
timize  the  estimation.  The  Stankovic-Katkovnik  adaptive 
algorithm  developed  in  [3]  and  [4]  can  be  used  for  d(t,  /).  In 
fact,  this  adaptive  algorithm  is  applicable  to  any  quadratic 
time-frequency  distribution  whose  IF  estimation  variance 
is  a  continuously  decreasing  function  of  ft  while  its  bias 
is  continuously  increasing.  These  conditions  are  necessary 
for  bias-variance  tradeoff  such  that  the  algorithm  converges 
at  the  optimum  window  length  that  resolves  this  tradeoff. 
Also  the  time-lag  kernel  of  the  distribution  should  not  per¬ 
form  narrowband  filtering  in  the  lag  direction  so  as  not  to 
interfere  with  the  adaptive  windowing  in  that  direction. 

The  following  estimates  for  the  amplitude  of  the  signal 
a  and  the  variance  of  noise  <r2  are  used  in  this  algorithm  [4] 

s2+52  =  ^£i*(nT)i2  (1§) 


^  =  I'  (19) 

n—2 

where  N  =  2N,  is  the  number  of  samples.  We  consider 
an  increasing  sequence  of  window  lengths  {ftr  |  r  =  1  : 
J}.  Since  the  optimal  window  length  is  time-dependent, 
the  optimal  IF  estimate  (as  given  by  eq.(ll))  is  also  time 
dependent.  For  details  see  [4,  6] 

It  should  be  emphasized  that  the  above  amplitude  esti¬ 
mation  works  only  for  mono-component  FM  signals. 

4.  IF  ESTIMATION  OF  MULTI-COMPONENT 
SIGNALS 

In  this  section  we  consider  a  multi-component  analytic  sig¬ 
nal  of  the  form 

M  M 

*(*)  =  £  +e,(t))  =  £a,e,>’(t)  +e(t)  (20) 

9=1  9=1 


where  the  amplitudes  {a,}  are  constant,  eq(t)  and  e(t)  are 
complex-valued  white  Gaussian  noise  processes  with  i.i.d. 
real  and  imaginary  parts  with  total  variance  cr2  and  <x2  = 
Aftr2,  respectively.  The  signal-to-noise  ratio  SNR  is  defined 
using  the  overall  average  amplitude  and  the  overall  noise. 
The  individual  IF  laws  for  ieach  component  are  given  by  [1] 

*•■<*> -si*  i?  =  1 . "•  <21> 

The  adaptive  algorithm  that  tracks  component  maxima 
in  the  time-frequency  plane  requires  a  threshold  pTH  ( t )  so 
as  to  ignore  the  local  maxima  caused  by  the  cross-terms  find 
windowing.  In  fact,  pTH(t )  is  application  and  distribution 
dependent. 

The  algorithm  also  requires  the  knowledge  of  the  confi¬ 
dence  intervals  Dr,q  for  each  component.  The  calculation  of 
DTlq  depends  on  the  estimation  of  the  individual  amplitudes 
a,-  of  the  components.  First  we  have  [6] 

M  N 

£  I  S,  |2  +dl  =  ~  £  |  z(nT)  |2  (22) 

9=1  1  n=l 

where  N  is  the  number  of  samples  and  the  estimate  of  is 
given  by  eq.(19).  Hence  a2  =  <t2/M  and  sa  =  I  «9  |2 

can  be  calculated. 

Now  if  the  ratio  between  the  component  amplitudes  can 
be  estimated,  the  actual  amplitudes  can  be  estimated.  If 
we  assume  that  the  ratio  of  the  qth  amplitude  to  the  first 
amplitude  is  r9  =|  o9  |  /  |  ai  |,  then  we  have  the  following 
estimates  for  the  component  amplitudes: 

M 

|aj2=fW(i  +  £??)  (23) 

t=2 

where  ~  indicates  the  estimated  values.  Using  the  proposed 
TFD  we  can  estimate  the  ratio  rq  directly  by  eq.(3)  at  the 
peaks  around  the  qth  and  the  first  components,  Pq(t,  f)  and 
P\{t,  /),  as  follows 

7*  =  mean{|  |}/meaxi{|  |}  (24) 

where  rpq{f)  and  V’i(Z)  can  be  estimated  after  using  the 
peak  trajectory  to  estimate  4>q{t)  and  respectively. 

Since  in  discrete  implementation  the  TFD  builds  up  in  the 
beginning  and  decays  in  the  end  due  to  the  lack  of  corre¬ 
lation  information,  it  is  better  not  to  include  the  start  and 
the  end  parts  of  the  TFD  in  the  estimation  using  eq.(24). 
Also  the  regions  of  rapid  change  in  the  IF  law  should  be 
excluded  as  eq.(3)  would  not  be  an  accurate  approximation 
to  d(t,f)  there  (unless  lag  windowing  is  used).  The  best 
estimate  is  obtained  when  there  is  a  linear  part  in  the  IF 
law,  in  this  case  the  mean  in  eq.(24)  is  taken  over  this  lin¬ 
ear  part,  using  eq.(4).  Further  studies  on  this  amplitude 
estimate  would  be  attempted  in  future  works. 

Using  |  a,  |2  and  a2  to  calculate  var(A/i/,r  (<))  (given 
by  eq.(14)  for  d(t,f)),  we  can  define  the  confidence  inter¬ 
vals  {Dr, 9}  for  all  components  as  in  [6].  The  IF  fi,q(t)  is 
contained  in  at  least  one  of  the  confidence  intervals  {Dr,9} 
if  ftr  is  sufficiently  small,  with  a  Gaussian  probability  P(k). 
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Figure  2:  IF  estimation  of  a  mono-component  non-linear 
FM  signal  with  total  signal  length  N  =  128  and  T  =  1/128 
using  d(t,  /).  Above:  IF  estimation  using  a  constant  win¬ 
dow  length  h  =  128.  Below:  Adaptive  IF  estimation.  The 
estimated  IF  law  is  compared  to  the  exact  IF  law  (dashed 
line). 

5.  SIMULATION  RESULTS 

Example  1:  The  discrete  version  of  d(t,f)  as  in  eq.(9)  us- 
ing  the  time-lag  kernel  in  eq.(l)  is  used  to  implement  the 
adaptive  algorithm  for  for  mono-  and  multi-component  sig¬ 
nals.  For  the  mono-component  case  we  consider  a  non¬ 
linear  frequency-modulated  signal  with  IF  given  by 

/(nT)  =  32  +  5  sinh-1(100(nT  -  0.5)) 

with  o  =  l,  SNR  =  15  dB,  a  =  0.1,  k  =  2,  0  <  nT  <  1, 
and  T  =  1/128.  In  Figure  2  The  result  of  the  adaptive  IF 
estimation  is  shown  as  compared  to  the  IF  estimation  using 
a  constant  window  length.  It  can  be  noticed  that  the  non- 
adaptive  approach  cannot  estimate  the  IF  law  accurately 
at  the  instants  of  rapid  change  since  the  second  derivative 
of  the  IF  law  is  effective  and  the  optimal  window  length  as 
in  eq.(16)  is  needed. 

Example  2:  We  consider  a  two-component  signal  with  non¬ 
linear  frequencies  given  by 

fi(nT)  =  40  +  5  smh_1(20 (nT  -  0.4)),  and  : 

/2(nT)  =  20  +  2.5  sinh-1(50(nT  -  0.8)) 

with  ai  —  a.2  =  1,  SNR  =  15  dB,  a  =  0.1,  k  —  2, 
0  <  nT  <  1,  and  T  =  1/128.  In  Figure  3  the  result  of 
the  adaptive  tracking  algorithm  is  shown  along  with  the 
adaptive  window  length  for  the  first  component.  It  is  ap¬ 
parent  that  the  adaptive  window  preserves  lower  lengths  at 
the  instants  of  rapid  change  in  the  component  IF  law  in 
accordance  with  eq.(16). 

6.  CONCLUSIONS 

This  paper  has  presented  an  adaptive  method  to  estimate 
the  IF  law  of  mono-  and  multi-component  FM  signals  us¬ 
ing  quadratic  time-frequency  distributions.  We  proved  that 


Figure  3:  Above:  adaptive  IF  estimation  of  a  two- 
component  FM  signal  using  d(t,  f)  as  compared  to  the  exact 
IF  laws  (dashed  lines).  Below:  adaptive  window  length  as 
a  function  of  time  for  the  first  component. 

an  IF  estimation  algorithm  with  adaptive  window  length 
is  applicable  to  any  quadratic  time-frequency  distribution 
that  satisfies  certain  conditions.  A  time-frequency  distribu¬ 
tion  d(t,  f)  that  satisfies  these  conditions  and  enables  non- 
parametric  amplitude  estimation  is  proposed.  A  compari¬ 
son  with  a  constant-window  tracking  algorithm  shows  that 
using  a  constant  window  length  cannot  give  a  robust  IF  es¬ 
timate  if  the  IF  changes  rapidly  with  time.  A  suggestion 
to  adopt  time-only  kernels  for  the  purpose  of  adaptive  IF 
estimation  is  also  presented. 
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ABSTRACT 

This  paper  presents  a  time-frequency  approach  for  electroen- 
cephalographic  (EEG)  seizure  detection.  The  proposed  method 
uses  the  high-resolution  reduced  interference  B  time-frequency 
distribution.  An  in-depth  analysis  of  the  seizure  detection  tech¬ 
niques  of  Gotman  (frequency  domain)  and  Liu  (time  domain)  has 
been  performed  in  order  to  compare  with  the  detection  criteria 
used  in  the  time-frequency  domain.  Both  synthetic  and  real  neo¬ 
natal  EEG  signals  have  been  used  for  testing. 

1.  INTRODUCTION 

Approximately  one  in  every  200  newborn  babies  experiences 
some  form  of  seizure,  indicating  cerebral  abnormalities,  or  dam¬ 
age  to  the  brain.  Unlike  adult  seizure,  the  effects  of  newborn 
seizure  are  subtle  and  hence  require  the  constant  attention  of  a 
medical  specialist  for  diagnosis. 

Monitoring  brain  activity  through  electroencephalographic 
(EEG)  data  has  become  a  successful  means  for  detecting  seizure 
in  adults.  This  involves  identifying  sharp,  repetitive  waveforms 
in  the  EEG  data  that  indicate  the  onset  of  seizure.  In  adults,  these 
signals  are  easily  recognisable  against  a  low  amplitude,  random 
background  characteristic  of  normal  brain  activity.  The  problem 
of  detecting  seizure  in  newborn  babies,  however,  is  complicated 
by  many  factors  [1],  Firstly,  healthy  newborn  EEG  signals  repre¬ 
senting  normal  brain  activity  often  contain  patterns  such  as  spu¬ 
rious  waveforms  and  sharp  spikes.  These  characteristics,  which 
would  otherwise  be  detected  as  seizure  in  adults,  are  simply  the 
result  of  extra  electrical  activity  produced  by  the  immature  brain 
as  it  continues  to  form.  Seizures,  however,  still  appear  in  the 
EEG  data  as  repetitive  waveforms  and  the  problem  lies  in  dis¬ 
cerning  the  healthy  spikes  from  those  formed  from  seizures.  Sec¬ 
ondly,  visual  symptoms  of  seizure,  such  as  muscle  spasms,  rapid 
eye  movement,  and  drooling,  are  much  more  subtle  in  newborns 
and  may  be  easily  missed.  These  visual  indicators  are  also  natural 
movements  common  to  all  newborn  babies.  Thirdly,  physical 
activity  of  babies  in  the  intensive  care  environment  is  often  sub¬ 
dued  by  medication  to  prevent  injuries  caused  by  unpredictable 
movements.  This  also  eliminates  the  chance  of  seizure  detection 
using  visual  signs  altogether. 

Currently  there  are  three  published  methods  for  EEG  seizure 
detection  in  newborns.  The  SPRC  technique  of  Roessgen  et  al 
[1]  is  a  parametric  approach  based  on  a  nonlinear  estimation  of 
1 1  model  parameters  for  detection.  The  two  other  methods  are 
non-parametric.  The  technique  of  Gotman  [2]  uses  frequency 
analysis  to  determine  the  changes  in  the  dominant  peak  of  the 
frequency  spectrum  of  short  epochs  of  EEG  data.  The  technique 


of  Liu  [3]  performs  analysis  in  the  time  domain  and  is  based  on 
the  auto-correlation  function  of  short  epochs  of  EEG  data. 

All  three  techniques  are  based  on  the  assumption  that  the 
EEG  signals  are  stationary  or  at  least  locally  stationary.  However, 
a  closer  examination  of  these  signals  often  shows  that  EEG  sig¬ 
nals  exhibit  significant  non-stationary  and  multi-component  fea¬ 
tures  (see  figure  1). 
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Figure  1  Time-frequency  representations  of  newborn 
EEG  seizure  signal  using  the  B-distribution 

To  take  these  characteristics  into  account,  we  propose  in  this 
paper  a  time-frequency  (TF)  domain  approach.  A  prerequisite  is 
the  selection  of  an  appropriate  time-frequency  distribution  (TFD) 
that  is  capable  of  handling  multicomponent  signals.  Once  the 
suitable  TFD  is  chosen,  a  calibration  process  is  undertaken.  This 
involves  initially  reproducing  the  seizure  detection  criteria  and 
characteristics  used  previously  in  other  methods  such  as  Got- 
man’s  and  Liu’s  and  map  them  in  a  joint  time-frequency  domain. 
Features  in  the  t-f  domain  indicating  seizure  are  then  identified 
and  a  detection  process  constructed  and  tested  in  the  time- 
frequency  domain.  The  proposed  process  is  shown  in  figure  2. 

2.  DATA  ACQUISITION 

Electrical  signals  produced  in  the  brain  can  be  monitored  in  a 
non-invasive  manner  by  measuring  variations  in  potential  on  the 
scalp.  This  EEG  measurement  is  achieved  by  strategically  plac¬ 
ing  several  small  electrodes  on  the  scalp,  and  forming  a  contact 
using  conductive  gel.  One  electrode,  usually  at  the  base  of  the 
skull,  acts  as  a  reference  (ground)  signal,  and  various  channels  of 
data  are  created  by  measuring  the  voltage  differences  between 
neighbouring  electrodes. 
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Data  used  in  our  study  has  been  collected  at  the  Royal 
Women’s  Hospital  Perinatal  Intensive  Care  Unit  in  Brisbane, 
Australia*.  Due  to  the  size  of  most  newborn  babies’  heads,  only 
five  channels  of  EEG  have  been  recorded  in  each  session  using 
the  10-20  International  System  of  Electrode  Placement.  The  EEG 
data  has  been  recorded  using  a  sampling  frequency  of  256  Hz. 
For  artefact  detection,  three  auxiliary  signals  representing  elec¬ 
tro-oculogram  (EOG),  electrocardiogram  (ECG),  and  respiration 
are  also  recorded  simultaneously  with  the  EEG. 
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Figure  2  Time-frequency-based  seizure  detection  process 
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3.  TIME-FREQUENCY  DISTRIBUTION 
SELECTION 

In  order  to  develop  seizure  detection  methods  in  the  time- 
frequency  domain,  it  is  necessary  to  select  a  suitable  TFD  to 
represent  EEG  data.  Since  neonatal  EEG  signals  are  non¬ 
stationary  and  occasionally  multi-component,  a  desirable  time- 
frequency  distribution  should  have  a  good  spectral  resolution  and 
reduced  cross-terms.  The  performance  and  characteristics  of 
several  distributions  have  been  compared  to  find  an  optimal  rep¬ 
resentation  of  real  neonatal  EEG  data  in  the  time-frequency  do¬ 
main.  The  scope  of  this  comparison  study  has  encompassed 
seven  distributions,  including  the  Spectrogram,  Wigner-Ville, 
Choi-Williams,  B-Distribution,  Zalto-Atlas-Marks,  Bom-Jordan, 
and  Rihaczek-Margenau  distributions  [4-5].  Each  time-frequency 
distribution  has  been  applied  to  epochs  of  real  neonatal  EEG  for 
various  data  window  lengths  and  individual  TFD  parameter  val¬ 
ues.  The  performances  of  the  resulting  time-frequency  represen¬ 
tations  have  been  compared  using  an  objective  quantitative 
measure  criterion  [5],  Based  on  this  criterion,  the  B-distribution 
with  the  smoothing  parameter  equal  to  0.01  has  been  selected  as 
the  most  suitable  representation  of  the  EEG  signals  in  the  time- 
frequency  domain  [5].  Figure  3  illustrates  the  time-frequency 
representations  of  a  30-seconds  sample  of  real  newborn  EEG 
data  using  the  B  and  the  Choi-Williams  (CW)  distributions. 

4.  FROM  TIME  DOMAIN  TO  TIME- 
FREQUENCY  DOMAIN 

4.1  Review  of  Liu’s  Method 

In  his  method,  Liu  relied  on  the  assumption  that  the  essential 
characteristic  in  newborn  seizure  EEG  is  periodicity.  The  amount 


Figure  3  The  B-distribution  with  p  =  0.01  (above)  and 
the  CW  distribution  with  o  =  10  (below)  of  a  real  epoch 
of  newborn  EEG. 


of  periodicity  in  the  autocorrelation  of  short  epochs  of  EEG  data 
is  scored  and  used  in  a  rule  based  algorithm  to  perform  classifi¬ 
cation.  In  this  technique,  an  epoch  consisting  of  30  seconds  of 
data  is  divided  into  5  windows  (see  figure  4). 
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Figure  4  Epoch  and  window  definitions  according  to  Liu 

Depending  on  the  autocorrelation  function  of  each  window, 
up  to  four  primary  periods  (7j,...,7'4)  are  calculated  for  each  win¬ 
dow  in  an  epoch  as  shown  in  figure  5.  These  times  correspond  to 
the  moment  centres  of  the  first,  second,  third  and  fourth  peaks  in 
the  autocorrelation  function.  The  windows  are  then  scored 
whereby  more  evenly  spaced  primary  periods  are  allocated  larger 
scores.  After  each  window  in  an  epoch  is  scored,  a  rule  based 
detection  scheme  is  applied  to  classify  each  epoch  as  positive  or 
negative.  If  two  or  more  channels  of  EEG  data  in  the  same  epoch 
are  as  positive,  the  epoch  is  then  classified  as  containing  seizure. 

The  above  procedure  can  be  summarised  as  follows: 

•  Calculate  the  autocorrelation  function  for  each  window 
within  each  epoch. 
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•  Locate  the  first  four  moment  centres  between  zero-crossings 
(if  they  exist). 

•  Calculate  the  ratios  between  the  first  and  subsequent  mo¬ 
ment  locations. 

•  Find  the  differences  between  each  of  these  three  ratios  and 
the  nearest  integer. 

•  Assign  scores  to  each  difference  according  to  a  non-linear 
scoring  system 

•  Sum  these  three  scores  to  give  a  total  for  each  window  in 
each  epoch 

•  Label  as  seizure  positive  or  negative,  depending  upon  their 
scores,  the  different  windows  within  each  epoch  and  across 
all  channels 


Figure  5  Autocorrelation  function  for  one  window 


4.2  Seizure  criteria  in  time  domain 

A  single  window  is  seizure  positive  by  the  method  of  Liu  if  the 
following  criteria  are  met: 

•  At  least  four  periods  exist  within  the  positive  half  of  the 
Autocorrelation  function. 

•  The  differences  between  the  ratios  of  each  moment  centre  to 
the  first  and  the  nearest  integer  are  less  than  0.150. 

•  The  total  score  obtained  by  summing  all  moment  centre 
scores  is  greater  than  or  equal  to  12  (out  of  a  maximum  of 
15). 

These  criteria  have  been  determined  after  closely  analysing 
the  scoring  system  to  find  definite  constraints  defining  seizure 
from  non-seizure  signals  in  the  time  domain.  This  has  essentially 
been  achieved  by  firstly  identifying  all  scoring  scenarios  leading 
to  a  score  greater  or  equal  to  +12,  the  simplest  way  of  achieving 
seizure  detection  as  defined  by  Liu.  Secondly,  these  scores  may 
be  broken  down  into  the  forerunning  ratios  and  moment  centre 
calculations  necessary  to  achieve  each  score.  Finally,  inherent 
signal  characteristics  and  constraints  necessary  to  achieve  these 
moment  centres  in  the  lag  domain  can  be  identified. 

4.3  Seizure  criteria  in  time-frequency  domain 

An  EEG  signal  within  a  given  window  is  considered  seizure  if  a 
continuous  spectral  peak  exists  within  the  window  and  meets  the 
following  criteria: 

•  All  frequencies  within  the  spectral  line  are  greater  than 
0.625  Hz  within  6.4-second  windows  or  greater  than  0.909 
Hz  within  4.4-second  windows. 

•  The  length  of  continuous  dominant  spectral  peak  within  the 
window  is  greater  than  3  seconds. 


where  a  continuous  spectral  peak  is  defined  as  adjoining  peaks 
within  the  time-frequency  array  above  a  threshold  of  one  fifth  the 
maximum  array  value. 

The  first  criterion  is  a  direct  translation  to  the  frequency  do¬ 
main  of  the  first  criterion  described  in  section  4.2.  That  is,  in 
order  for  three  moment  centres  to  exist,  four  periods  must  occur 
in  the  lag  domain  over  a  single  window.  Applying  the  property  of 
the  Autocorrelation  function  that  states  that  the  Autocorrelation 
function  of  a  periodic  signal  is  also  periodic  with  the  same  pe¬ 
riod,  this  is  interpreted  by  assuming  four  periods  of  the  signal 
must  exist  in  a  single  window.  This  translates  simply  into  the  first 
time-frequency  criteria  stated  above. 

The  second  criterion  has  been  deduced  by  observation  of 
several  time-frequency  representations  of  seizure  positive  win¬ 
dows  as  defined  by  Liu.  The  scoring  system  focuses  on  identify¬ 
ing  periodic  regions  of  data  using  the  lag  domain.  Periodic  re¬ 
gions  are  clearly  identified  in  time-frequency  representations  by  a 
dominant  spectral  peak  occurring  for  a  certain  time  interval. 
Therefore,  this  is  a  less  stringent  criteria  translation,  but  one 
based  upon  identifying  a  common  characteristic  in  each  domain. 
Further  statistical  analysis  is  required  to  determine  an  exact  du¬ 
ration  necessary  to  identify  seizure  by  this  method.  The  key  fac¬ 
tor  to  the  success  of  this  method  has  involved  the  discovery  of 
frequency  restrictions  existing  inherently  within  the  scoring  sys¬ 
tem  designed  by  Liu. 

4.4  Implementation 

Extraction  of  the  seizure  criteria  listed  above  in  the  TF  domain 
has  been  successfully  calibrated  for  the  method  of  Liu.  Peak 
detection  techniques  from  image  processing  have  been  employed 
to  simplify  the  extraction  process,  resulting  in  a  detection  array 
illustrating  positions  and  lengths  of  continuous  spectral  lines 
within  each  epoch.  Figure  6  shows  the  algorithm  flow  chart  used 
in  the  implementation  of  this  method. 

4.5  Results  and  discussion 

Very  promising  results  have  been  obtained  using  time-frequency 
algorithms  to  detect  individual  seizure  windows  of  real  neonatal 
EEG  in  the  time-frequency  domain.  Approximately  75%  of  win¬ 
dows  detected  as  seizure  positive  by  Liu  are  detected  by  applying 
TF  criteria  listed  above.  The  result  of  applying  the  proposed 
time-frequency-based  detection  method  is  illustrated  in  figure  7. 
Original  epoch  refers  to  the  raw  TF  array  produced  from  pre- 
processed  EEG  data.  Images  of  these  arrays  appear  on  the  left 
side  of  the  figure.  These  arrays  are  also  divided  into  four  distinct 
6.4-second  windows  and  one  4.4-second  window  as  defined  by 
Liu.  Window  scores  attained  for  these  epochs  are  displayed  at  the 
end  of  each  window  division.  This  makes  for  an  easy  compari¬ 
son  between  the  TF  information  contained  within  each  window, 
and  the  corresponding  score  allocated  by  Liu. 

That  only  75%  of  the  seizures  predicted  by  Liu’s  method  are 
detected  is  mainly  due  to  the  fact  that  our  method  uses  scores  of 
single  windows  while  Liu  used  the  combined  scores  of  up  to 
three  consecutive  windows  per  epoch.  Future  implementations  of 
our  method  will  include  the  different  possible  window  combina¬ 
tions. 
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Window  Scores  over  Time  (0-30  seconds) 


Figure  6  Implementation  and  calibration  of  the  time- 
frequency  extension  of  time-domain  method 
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Figure  7  The  mapping  of  Liu’s  method  to  time-frequency 


5.  FROM  FREQUENCY  DOMAIN  TO 
TIME-FREQUENCY  DOMAIN 


5.1  Review  of  Gotman’s  method 

The  method  proposed  by  Gotman  is  based  on  spectral  analysis 
and  is  used  to  detect  periodic  discharges.  A  background  epoch  is 
defined  as  a  20-second  segment  of  EEG  finishing  60  seconds 
before  the  start  of  the  current  10-second  epoch  being  investigated 
(see  figure  8).  The  main  advantage  of  a  moving  background  ep¬ 
och  is  that  results  are  not  dependent  on  the  specific  features  of  a 
fixed  epoch. 

Background  Epoch  Current  Epoch 


Figure  8  Epoch  and  window  definitions  according  to  Gotman 


The  frequency  spectrum  of  each  10-second  epoch  is  calcu¬ 
lated  and  the  following  features  are  extracted: 

•  The  frequency  of  the  dominant  spectral  peak. 

•  The  width  of  the  dominant  spectral  peak. 

•  The  ratio  of  the  power  in  the  dominant  spectral  peak  to  that 
of  the  background  spectrum  in  the  same  frequency  band. 

The  10-second  epoch  of  data  is  considered  seizure  positive  if 
any  of  the  following  criteria  are  met: 


Dominant  Half-Maximum  PowerRatio 

Frequency  Bandwidth 

1.  0.5 -1.5  Hz  <0.6 Hz  3-4 

2.  1.5  -  10  Hz  <  0.6  Hz  2-4 

3.  1.5 -10  Hz  <  1  Hz  4-80 

If  an  epoch  is  classified  as  containing  seizure  based  on  the  above 
criteria,  a  further  three  criteria  are  used  to  limit  the  number  of 
false  alarms.  Seizure  detection  is  discounted  if  the  epoch  is 
largely  non-stationary,  if  there  is  a  large  amount  of  AC  power 
noise  present  or  if  it  appears  that  an  EEG  lead  has  been  discon¬ 
nected. 

The  aim  of  this  method  is  to  determine  if  a  dominant  peak 
exists  in  the  power  spectral  density  (PSD).  This  is  equivalent  to 
detecting  if  an  EEG  waveform  has  a  dominant  periodic  shape  in 
the  time  domain.  The  feature  space  used  to  classify  an  epoch  as 
seizure  ensures  that  the  dominant  peak  of  the  spectrum  is  signifi¬ 
cant  compared  to  the  background  spectrum. 

5.2  Seizure  criteria  in  time-frequency 


Since  a  time-frequency  representation  is  comprised  of  the  in¬ 
stantaneous  spectra  of  a  signal  over  time,  criteria  pertaining  to 
frequency  and  bandwidth  above  are  clearly  discernible  in  the 
time-frequency  array.  That  is,  each  spectra  containing  a  dominant 
peak  that  meets  either  of  the  criteria: 

•  Frequency  in  the  range  0.5  -  1.5  Hz  and  width  <  0.6  Hz. 

•  Frequency  in  the  range  1.5-10  Hz  and  width  <  1Hz. 
may  be  considered  for  further  seizure  detection  pertaining  to 
power  ratio.  Disregarding  the  power  ratio  criteria  defined  in 
section  5.1,  the  second  criterion  becomes  a  subset  of  third 
criterion.  Due  to  the  instantaneous  nature  of  the  time- 
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frequency  array,  this  power  ratio  is  not  obvious,  and  requires 
further  investigation  for  explicit  definition. 

5.3  Implementation 


Algorithm  implementing  the  two  time-frequency  criteria  identi¬ 
fied  above  is  illustrated  in  figure  9.  This  essentially  extracts  fre¬ 
quency  and  width  information,  the  results  of  which  are  visible  in 
the  plots  shown  in  figure  10. 


Figure  9  Implementation  and  calibration  of  time-frequency  ex¬ 
tension  of  frequency-based  method 

5.4  Results  and  discussion 

The  result  of  applying  the  above  proposed  algorithm  is  illustrated 
in  figure  10.  Data  is  presented  by  highlighting  the  position  of  the 
dominant  frequency  with  a  colour  indicating  the  width  of  the 
spectral  peak.  Boxed  sections  of  the  array  indicate  regions  de¬ 
tected  as  containing  seizure  by  the  conventional  method  of  Got- 
man.  This  has  been  included  to  aid  visual  recognition  of  any 
predominant  features  that  may  stand  out  in  the  processed  time- 
frequency  array  of  seizure  epochs.  The  main  limititions  of  this 
method  lie  in  the  ability  to  accurately  assess  differences  in  power 
between  current  and  reference  epochs  due  to  the  instantaneous 
nature  of  the  time-frequency  array  under  analysis.  Further 
research  into  this  matter,  and  its  incorporation  into  the  detection 
algorithm  defined  in  the  above  sections,  should  result  in  clearly 
recognisable  features  defining  seizures  in  the  TF  array. 
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Figure  10  The  mapping  of  Gotman’s  method  to  time-frequency 

6.  CONCLUSIONS 

The  initial  results  obtained  show  that  the  time-frequency  do¬ 
main  is  a  suitable  basis  from  which  to  develop  a  complete  seizure 
detection  scheme.  Successful  mapping  of  Liu's  detection  criteria 
into  the  time-frequency  domain  has  been  completed,  and  map¬ 
ping  of  two  out  of  three  criteria  detailed  by  Gotman  have  also 
been  implemented  and  tested  in  the  time-frequency  domain. 
Those  two  mappings  allowed  us  to  calibrate  the  time-frequency- 
based  method.  The  next  step  will  be  to  develop  a  fully  integrated 
time-frequency  detection  method  by  combining  the  different 
time-frequency  seizure  features  identified  in  this  paper. 

Essentially,  this  paper  provides  proof  of  concept  of  seizure 
detection  in  the  time-frequency  domain.  Further  results  will  ap¬ 
pear  elsewhere. 
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ABSTRACT 

Time-frequency  distributions  (TFDs)  belonging  to  Cohen’s 
class  are  usually  designed  to  satisfy  the  frequency  marginal. 
The  conventional  frequency  marginal  is  equivalent  to  the 
classical  periodogram.  It  is  known  that  the  periodogram  is 
not  a  good  spectral  estimator.  For  this  reason,  Thomson 
[3]  introduced  a  multitaper  spectral  estimator  which  treats 
both  the  bias  and  the  variance  problems  inherent  to  non- 
parametric  spectral  estimators.  In  this  paper,  we  are  intro¬ 
ducing  a  new  kernel  design  method  which  achieves  Thom¬ 
son’s  spectral  estimate  as  the  frequency  marginal.  This 
new  method  results  in  a  signal-dependent  kernel  design. 
The  resulting  kernel  belongs  to  the  class  of  reduced  interfer¬ 
ence  distribution  (RID)  kernels  and  therefore  this  new  time- 
frequency  distribution  will  be  called  multitaper  reduced  in¬ 
terference  distribution  (MT-RID).  The  performance  of  this 
method  is  compared  to  the  previously  introduced  multiwin¬ 
dow  time-frequency  distribution  (MW-TFD)  [6]  through 
simulations. 

1.  INTRODUCTION 

Time-frequency  distributions  belonging  to  Cohen’s  class  [1] 
are  usually  designed  to  satisfy  the  frequency  marginal.  The 
time-frequency  kernel  is  designed  such  that  it  yields  the 
classical  periodogram  as  the  frequency  marginal.  It  is  known 
that  the  classical  periodogram  is  not  a  good  spectral  esti¬ 
mator  due  to  its  nonzero  variance  even  when  the  number  of 
data  samples  goes  to  infinity  [2].  For  this  reason,  several 
modified  periodogram  methods  have  been  introduced  which 
reduce  the  variance  at  the  expense  of  increasing  its  bias. 
For  short,  time-limited  signals,  Thomson  [3]  suggested  us¬ 
ing  a  set  of  orthogonal  windows  to  compute  several  direct 
spectrum  estimates  of  the  entire  signal  and  then  average 
the  resulting  spectrums  to  construct  a  spectral  estimate. 
The  orthogonal  windows  used  are  the  eigenfunctions  ,  dis¬ 
crete  prolate  spheroidal  wave  functions  [4],  of  the  spectral 
estimation  kernel.  Since  the  windows  are  orthogonal  and 
optimally  concentrated  in  the  frequency  domain,  the  result¬ 
ing  spectral  estimate  treats  both  the  bias  and  the  variance 
problems. 

Several  authors  have  extended  Thomson’s  method  to 
nonstationary  spectrum  estimation  [5,  6,  7,  8].  In  [6] 
and  [7],  the  authors  applied  prolate  spheroidal  sequences 

This  research  was  supported  in  part  by  grants  from  the  Rack- 
ham  School  of  Graduate  Studies  and  the  Office  of  Naval  Re¬ 
search,  ONR  grant  no.  N000014-97- 1-0072 


or  windows  which  are  optimally  concentrated  in  the  time- 
frequency  plane,  i.e.  Hermite  functions,  to  compute  several 
spectrograms  and  then  combined  them  to  obtain  a  time- 
varying  spectrum  estimate.  In  [5],  the  spectral  represen¬ 
tation  theorem  for  stationary  processes  is  extended  to  the 
nonstationary  case  and  the  eigenfunctions  are  found  to  con¬ 
struct  the  time-varying  spectrum  estimator. 

In  this  paper,  we  approach  the  problem  from  the  per¬ 
spective  of  the  frequency  marginal  and  solve  for  a  time- 
frequency  kernel  which  will  give  us  the  desired  marginal. 
We  derive  the  conditions  on  the  time-frequency  kernel  such 
that  it  yields  Thomson’s  spectrum  as  the  frequency  marginal. 
It  is  shown  that  the  corresponding  time-frequency  kernels 
are  signal-dependent.  The  time-frequency  kernels  designed 
in  this  manner  belong  to  the  class  of  reduced  interference 
distribution  (RID)  kernels.  Therefore,  we  are  going  to  refer 
to  this  new  class  of  time-frequency  distributions  as  Mul- 
titaper-RID  (MT-RID).  This  approach  provides  smoother 
time-frequency  distributions  which  are  less  prone  to  noise. 
The  performance  of  this  method  is  then  compared  to  the 
multiple  window  spectrogram  method  [6]  for  example  sig¬ 
nals  in  additive  noise  through  simulations. 

2.  MULTITAPER  REDUCED  INTERFERENCE 
DISTRIBUTION  (MT-RID) 

For  a  time-frequency  distribution  belonging  to  Cohen’s  class, 
it’s  desirable  to  have  the  following  two  properties,  the  time 
and  the  frequency  marginal,  satisfied  1 : 

J  C{t,u))dw  =  |s(t)|2 

j  C{t,u)dt=\S(u)\2  (1) 

When  the  frequency  marginal  is  satisfied,  the  energy  dis¬ 
tribution  of  the  signal  is  represented  by  the  classical  peri¬ 
odogram  |S(w)|2.  It  is  well  known  that  the  classical  peri¬ 
odogram  is  not  a  good  spectral  estimator  due  to  its  incon¬ 
sistency.  Thomson’s  method  overcomes  the  bias-variance 
tradeoff  inherent  in  nonparametric  spectral  estimation  meth¬ 
ods.  This  method  is  equivalent  to  using  the  weighted  aver¬ 
age  of  a  series  of  direct  spectrum  estimates  based  on  orthog¬ 
onal  data  windows.  The  high  resolution  spectrum  estimate 
around  a  point  f0  is 
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K- 1 

~  2NW  ^  \k(N,  W)  l^fc^°)l  (2) 

fc=o 

where  A*,  is  the  eigenvalue  corresponding  to  the  fcth  orthog¬ 
onal  window,  NW  is  the  time-bandwidth  product  and  each 
W/o)|2  is  the  spectral  estimate  computed  using  the  kth 
discrete  prolate  spheroidal  sequence  expressed  as: 

l»(/.)l*  =  I  V  ,(,)^ES,-W-.|.  (3) 


If  we  further  assume  that  the  number  of  signal  samples  goes 
to  infinity  then  the  above  equation  will  reduce  to: 

E  ^°’  mMz  +  y  )x*(l  ~  y ) 

l 

=  +  Vm  (9) 

which  will  in  turn  give  an  expression  for  0(0,  m)  in  terms  of 
the  signal  and  the  data  window  used  in  spectral  estimation. 


where  v„(N,  W)s  are  the  discrete  prolate  spheroidal  sequen¬ 
ces  and  tk  is  a  normalization  factor. 

We  take  this  spectrum  estimate  as  a  basis  for  our  im¬ 
proved  frequency  marginals,  and  solve  for  the  corresponding 
time-frequency  kernel. 

Any  time-frequency  distribution  with  an  alias-free  ker¬ 
nel  [10],  ip(n,m)  in  the  time-lag  domain  can  be  written 
as  2: 

EE*""  Z> m)x(l  +  j)x\l  -  ~)e-jW  (4) 

m  l 

First  ,  let  us  equate  the  frequency  marginal  of  this  distribu¬ 
tion  to  a  smoothed  periodogram,  i.e.  data  is  smoothed  by 
a  single  window  ,  and  then  extend  the  results  to  Thomson’s 
multitaper  spectrum  estimate.  The  frequency  marginal  is 
equal  to 

E  E  E  “  Z’  mMz  +  y  )**(*  - 

n  m  l 

=  \Y/h(k)x(k)e~iwkf  (5) 

k 

where  h(k)  is  the  time  domain  window.  After  some  alge¬ 
braic  manipulations,  it  is  found  that 


0(0,  m) 


Eih(l+f)x(l+f)h^l-f)xm-f) 

E,  *(*  +  ?)*•(*-?) 


(10) 


The  above  equation  can  be  interpreted  to  be  the  ratio  be¬ 
tween  the  biased  autocorrelation  estimate  for  the  windowed 
signal  and  the  biased  autocorrelation  estimate  for  the  orig¬ 
inal  signal.  It  is  important  to  keep  in  mind  that  the  above 
equation  is  only  an  asymptotic  result,  since  it  is  only  true 
when  the  number  of  signal  samples  goes  to  infinity.  Since 
in  real  cases  we  are  going  to  have  a  finite  number  of  sam¬ 
ples  of  a  given  signal,  the  equality  given  above  will  only  be 
an  estimate  of  the  actual  result.  This  result  also  implies 
that  when  we  have  a  rectangular  window  for  h(t),  0(0,  m) 
becomes  equal  to  1  which  is  the  well-known  constraint  on 
time-frequency  kernels  for  satisfying  the  conventional  fre¬ 
quency  marginal.  For  anything  other  than  the  rectangular 
window,  it  is  not  possible  to  have  general  constraints  on  the 
kernel.  Therefore,  the  kernel  designed  to  satisfy  a  specific 
frequency  marginal  will  be  signal  dependent. 

We  can  easily  extend  these  results  to  Thomson’s  spec¬ 
trum  estimate  by  combining  the  constraints  imposed  by 
each  window. 

If  we  apply  the  previous  results  for  the  time-frequency 
kernel  to  achieve  a  frequency  marginal  equal  to  |yjt(/0)|2 
given  in  equation  3,  we  will  obtain: 


E  E  ^  +  y)*(*  +  y  -  y  )**(*  -  f  )e~j“m 

m  k 

=  E  E  E  -  Z’  ™)*‘(Z  -  f  Mz  +  f  )e~ium  (6) 

m  n  l 

This  equality  implies  that 


0fc(O,m) 


Ei  v{k)(l  +  fKw(i  -  fMi  +  f  )x'(i  -  f) 

Zi  +  &*•(!-$) 


(ii) 


where  each  data  window  produces  its  own  corresponding 
kernel.  The  final  kernel  is  a  weighted  summation  of  indi¬ 
vidual  kernels. 


E  h0  +  J Mz  +  j)h‘(l  -  -  j) 

=  E[E^n-i’m)l®’(i_  yM*  +  y)  Vm  (7) 

l  n 

If  we  express  the  kernel  in  terms  of  its  ambiguity  domain 
function,  we  will  end  up  with  the  following  expression: 

EE  / 

l  n 

(E  ejS")0(0,  m)e~ieide]x(l  +  y  )x*(l  -  y )  (8) 

n 

2 All  summations  are  7°°  _  unless  otherwise  specified. 
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1 

2NW 


K- 1 

£ 


fc=0 


1 

A  k(N,W) 


0fc(O,  771 ) 


(12) 


It  is  apparent  from  this  equation  that  there  is  no  unique 
way  of  designing  the  kernel  given  this  one  dimensional  con¬ 
straint.  For  this  reason,  we  consider  a  construction  algo¬ 
rithm  which  will  require  the  least  amount  of  computation. 
The  kernel  is  built  in  an  iterative  manner  in  the  time-lag  do¬ 
main  such  that  the  summation  of  the  kernel  elements  along 
the  time  direction  at  any  given  time-lag  gives  the  value  of 
0(0,  m)  at  the  particular  lag  value  [9],  This  construction 
guarantees  that  the  desired  frequency  marginal  is  achieved 
along  with  RID  characteristics  and  provides  minimal  com¬ 
putational  complexity  since  the  kernel  is  constructed  as  an 
outer  product  of  orthogonal  vectors. 
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3.  REVIEW  OF  MULTIWINDOW 
TIME-FREQUENCY  DISTRIBUTION 
(MW-TFD) 

Thomson’s  multiwindow  spectrum  estimation  for  stationary 
signals  is  extended  to  the  nonstationary  case  in  [6].  The 
MW-TFD  is  applied  to  a  signal  in  a  similar  manner  as  the 
spectrogram.  However,  instead  of  applying  a  single  sliding 
window  along  the  signal,  the  MW-TFD  applies  a  set  of 
orthogonal  sliding  windows  and  then  takes  the  average  as 
follows. 

K-l 

Xmw(ti, w)  =  -g  ^  |Xfc(n,u;)|2  (13) 

k=0 

where  each  X\ t(n,w)  is  expressed  as  a  short-time  Fourier 
transform  computed  using  the  fcth  window  function. 

Xk{n,uj)  =  ^^x(m)hk{m  -  n)e_j2ljm  (14) 


4.  EXAMPLES 

In  this  section,  we  are  going  to  give  some  preliminary  re¬ 
sults  for  the  bias  and  the  variance  analysis  of  the  two  time- 
varying  spectral  estimators  discussed  above.  The  main  per¬ 
formance  analysis  will  be  based  on  simulations. 

For  the  MW-TFD,  the  expected  value  of  the  estimate 
is  given  by: 

k- l  _ 

E[XMw(n,w)]  --  —  £££  r{l,m)hk(l-n)h*k(m-n)e  m ^ 

k~  0  l  m 

(15) 

When  the  windows  are  normalized,  this  results  in  an  unbi¬ 
ased  estimator  for  stationary  white  process.  Similarly,  for 
MT-RID,the  expected  value  of  the  distribution  is  given  as 
follows: 

E[XMT-Rip(fl,u>)\  =  'y  ^y^V>(n - - — m)r(l,m)e  m'> 


When  the  kernel  is  properly  normalized,  we  get  an  unbised 
estimator  for  white  spectra.  At  this  point,  we  don’t  have 
closed  form  expressions  for  the  variances  of  the  two  time- 
frequency  distributions.  Amin  has  introduced  formulations 
for  average  variance  of  time-frequency  distributions  of  sig¬ 
nals  in  noise  in  [11].  Since  we  are  interested  in  analyzing 
local  phenomena,  the  variance  of  the  time-frequency  distri¬ 
butions  will  be  compared  through  simulations.  The  first  ex¬ 
ample  that  we  will  consider  is  a  complex  exponential  with 
additive  white  Gaussian  complex  noise.  The  signal  plus 
noise  model  can  be  expressed  as: 

x{n)  =  s{n)  +  rj(n)  n  =  0,  ...,64 

s(n )  =  exp(jwon) 

Var[?7(n)]  =  0.1  (17) 

In  this  case,  the  kernel  designed  to  achieve  Thomson’s  spec¬ 
trum  as  the  frequency  marginal, equation  12,  has  to  satisfy 
the  following  condition: 

4>k  (0,  m)  -  ^  hk  {l  +  m/2)h*k  {l  -  m/2) 

l 


</>(0,  m)  =  (pk(0,m)  (18) 

k=0 

This  is  equivalent  to  averaging  the  autocorrelation  functions 
of  individual  windows  and  imposing  that  as  the  kernel  at 
9  =  0.  Similarly,  the  kernel  for  MW-TFD  is  the  average  of 
the  ambiguity  functions, Ah|b  (0,  m),  for  each  window.  It  is 
expressed  as: 

1  K_1 

<f>Mw{6,m)  =  —  Ah.k{—Q,vri) 
k= 0 

Ahk(-0,m)  =  y2h>*(l  +  rn/2)h*k(l  -  m/ 2)e~3$l  (19) 

i 

It  is  apparent  from  the  above  two  equations  that  these  two 
kernels  agree  for  0  =  0,  and  thus  will  have  similar  frequency 
marginals.  The  kernel  for  MW-TFD  in  the  ambiguity  do¬ 
main  is  concentrated  along  0  =  0  axis  ,  whereas  the  MT- 
RID  kernel  will  have  RID  structure  due  to  the  design  pro¬ 
cedure  described  in  Section  2. (Figure  1)  The  structure  of 
the  kernel  for  MW-TFD  suggests  that  it  is  good  in  extract¬ 
ing  impulses  along  the  frequency  dimension  ,i.e.  complex 
exponentials,  and  not  so  good  in  tracking  time- varying  phe¬ 
nomena. 


Figure  1:  The  time-frequency  kernels  in  the  ambiguity  do¬ 
main  for  example  1  a)  The  kernel  for  MW-TFD,  b)  The 
kernel  for  MT-RID 
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The  complex  exponential  signal  buried  in  white  Gaus¬ 
sian  complex  noise  is  simulated  at  SNR=10dB  for  500  times. 
The  standard  deviation  is  computed  across  all  time  for  all 
simulations.  It  is  seen  that  the  deviation  is  highest  at  the 
fundamental  frequency  of  the  complex  exponential. (Figure 
2)  MT-RID  produces  a  time-frequency  distribution  with  a 
higher  variance  compared  to  the  MW-TFD  since  the  latter 
is  a  spectrogram-based  method  which  only  takes  on  positive 
values.  In  the  second  example,  we  consider  a  linear  chirp 


Figure  2:  The  standard  deviations  of  the  time-frequency 
distributions  for  the  complex  exponential  signal  plus  noise 
based  on  500  simulations. 

signal  plus  noise. 

x(n)  =  s(n)  +  r](n)  n  =  0,  ...,64 
s(n)  =  exp(j(u0n  +  0n2/2)) 

Var[T](n)\  =  0.1  (20) 

In  this  case,  the  kernels  for  the  two  methods  are  not  the 
same  and  a  closed  form  expression  for  the  constraint  on  the 
kernel  of  MT-RID  is  complicated  to  formulate.  Still,  we  can 
build  the  kernel  using  equation  12.  An  analysis  similar  to 
above  can  be  done  to  see  the  standard  deviation  of  the  time- 
frequency  distribution  based  on  500  simulations.  In  this 
case,  the  MT-RID  method  exhibits  a  more  stable  spectrum. 
Due  to  the  shape  of  its  kernel  in  the  ambiguity  domain,  the 
MW-TFD  method  does  not  offer  good  resolution  properties. 
This  induces  a  large  variance  around  the  true  chirp  rate. 
(Figure  3) 


5.  CONCLUSIONS 

In  this  paper,  we  have  introduced  an  alternative  approach 
to  extending  Thomson’s  multitaper  spectrum  estimation 
method  to  the  time-varying  case.  The  necessary  condi¬ 
tion  on  the  kernel  function  to  obtain  a  frequency  marginal 
equal  to  Thomson’s  spectrum  estimator  is  derived  and  this 
leads  to  a  new  time-frequency  analysis  method,  multitaper 
reduced  interference  distribution  (MT-RID).  This  method 
is  then  compared  to  the  MW-TFD  method  which  is  a  di¬ 
rect  extension  of  Thomson’s  method  to  nonstationary  case 
[6].  The  statistical  performance  of  the  two  methods  are 
compared  for  noisy  test  signals  through  simulations.  The 


Figure  3:  The  standard  deviation  of  the  frequency 
marginals  for  the  noisy  chirp  signal  and  the  variance  of 
time-frequency  distributions  for  500  simulations:  a)  MT- 
RID  method,  b)  MW-TFD  method 


results  show  that  MT-RID  gives  a  better  resolution  for 
time-varying  components  whereas  the  MW-TFD  is  better 
for  monochromatic  signals.  The  MW-TFD  also  offers  a 
smoother  distribution  due  to  extensive  averaging  inherent 
to  its  mechanism.  The  results  can  be  generalized  for  dif¬ 
ferent  classes  of  signals  by  obtaining  an  expression  for  the 
variance  of  the  estimators. 
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ABSTRACT 

We  extend  the  ideas  of  the  instantaneous  frequency  and 
the  instantaneous  bandwidth  of  a  signal  by  defining 
the  instantaneous  skew  and  kurtosis,  as  well  as  higher 
instantaneous  moments,  of  a  signal.  Expressions  are 
derived  in  terms  of  the  signal  amplitude  and  phase, 
analogous  to  the  situation  for  instantaneous  frequency 
and  bandwidth.  As  is  the  case  in  time-frequency  anal¬ 
ysis  with  instantaneous  frequency  and  bandwidth,  the 
instantaneous  moments  we  derive  may  be  viewed  as 
conditional  moments  of  the  time-varying  spectral  den¬ 
sity  of  the  signal. 

1.  INTRODUCTION 

Instantaneous  frequency  is  a  fundamental  concept  of 
signals  arising  in  many  areas,  including  communica¬ 
tions,  seismology,  sonar  and  radar,  among  others  [1, 2, 
7,14,18].  As  Ville  showed,  it  is  intimately  connected 
to  the  time-varying  spectrum  of  a  signal;  in  particular, 
Ville  showed  that  the  first  conditional  spectral  moment 
of  the  Wigner  distribution  of  a  signal  s(t)  =  A{t)e3V^ 
is  equal  to  its  instantaneous  frequency,  (p'{t )  [18].  This 
property  was  later  found  to  hold  for  many  other  time- 
frequency  distributions  of  a  signal  [4,7]. 

The  conditional  spectral  moments  of  a  time-frequency 
distribution  P(t,uj)  are  obtained,  as  with  any  joint  den¬ 
sity,  via 


J  u>nP(t,u>)dw 

=  J  unP(t,u)dw  j  J  P(t,u)dw.  (1) 


Central  conditional  moments  can  also  be  obtained  in 
the  usual  way,  by  subtracting  the  mean  frequency  at 
each  time,  according  to 


pSW  =  /(<*>- {u)t)nP(t,w)du>.  (2) 


•This  work  was  supported  by  the  Office  of  Naval  Research 
(grant  no.  N00014-98- 1-0680). 


Cohen  has  extensively  considered  the  second  central 
conditional  spectral  moment,  which  is  the  conditional 
spectral  variance,  and  introduced  the  notion  of  the  in¬ 
stantaneous  bandwidth  of  a  signal  [6-9].  In  particular, 
Cohen  has  argued  that  the  second  conditional  spectral 
moments  are 

<“2>' =  (m)1 + v'2(t>  (3> 

A&W  =  <£(*)  =  (w2>t  -  (w)t  =  (^)  -  (4) 

where  the  square  root  of  the  latter  quantity  is  defined 
as  the  instantaneous  bandwidth  of  the  signal.  (Poletti 
presents  an  alternative  interesting  viewpoint  of  instan¬ 
taneous  bandwidth,  and  has  shown  how  to  derive  it 
in  terms  of  a  local  Taylor  series  expansion  of  the  sig¬ 
nal  [17].  In  particular,  a  first-order  expansion  yields 
Cohen’s  definition.)  Instantaneous  bandwidth,  like  in¬ 
stantaneous  frequency,  is  an  important  physical  quan¬ 
tity  that  characterizes  time- varying  spectral  properties 
of  the  signal,  and  has  found  application  in  a  variety  of 
areas,  including  acoustics,  Doppler  flow  measurements, 
and  seismology  [1,3,15]. 

In  this  paper,  we  extend  these  ideas  to  higher  in¬ 
stantaneous  moments,  such  as  the  instantaneous  skew 
and  the  instantaneous  kurtosis,  which  are  third  and 
fourth  order  moments,  respectively  [11,16].  Building 
on  the  foundations  laid  by  Ville  and  Cohen,  we  derive 
expressions  for  all  of  the  instantaneous  moments  of  a 
signal,  with  particular  attention  to  the  skew  and  kur¬ 
tosis. 


2.  BACKGROUND 

The  procedure  we  use  is  to  apply  operator  methods 
to  derive  the  instantaneous  moments,  analogous  to  the 
approach  used  by  Cohen  [6].  In  the  context  of  time- 
frequency  analysis,  Ville  was  the  first  to  use  the  oper¬ 
ator  method,  which  was  subsequently  significantly  ex¬ 
tended  by  Cohen;  in  particular,  Cohen  has  shown  how 
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to  obtain  local  expectation  values  (mean  and  variance) 
using  operator  methods  [5,6]. 

The  fundamental  idea  is  that  the  physical  quan¬ 
tity  of  interest  (in  our  case,  frequency,  w)  is  repre¬ 
sented  by  a  Hermitian  operator  (for  frequency  the  op¬ 
erator  is  yV  =  ^  in  the  time  domain).  A  key  as¬ 
pect  of  the  operator  method  is  that  one  can  obtain 
frequency  averages  directly  from  the  signal,  without 
having  to  first  obtain  the  Fourier  transform  of  the  sig¬ 
nal,  S(u)  =  f s{t)e~lut  dt.  For  example,  the  average 

value  of  u)  and  of  up  (the  first  two  global  spectral  mo¬ 
ments)  are  given  by  [6, 7]1 


M-j 

f  u )  |S(w)|2  du  = 

1  s*(t)Ws(t)dt 

=  J 

[  w2|5(w)|2  du  = 

J  s*  (t)W2s(t)dt 

=  j 

WW 

dt, 

(5) 

(6) 


where  the  rules  of  operator  algebra  were  used  in  ar¬ 
ranging  the  latter  term  of  eq.  (6)  to  show  the  inherent 
positivity  of  the  second  moment  (equivalently,  one  may 
substitute  in  the  operator  and  use  integration  by  parts 
to  obtain  a  positive  integrand  [14]). 

Since  one  is  averaging  over  time  in  the  expressions 
above  to  obtain  the  global  average,  Cohen  and  others 
have  reasoned  that  the  integrand  represents  the  instan¬ 
taneous  value  of  the  quantity  [6,7,18].  In  particular, 
for  the  signal  s(t)  =  A(t)e3V^\  we  have 


Ws(t) 

8{t) 


A'(t) 

A(t) 


(7) 


Analogously,  Cohen  has  derived  the  second  moment 
[6-9];  following  from  eqs.  (6)  and  (7),  we  have  that  the 
instantaneous  second  spectral  moment  is  given  by, 


(<A 


Ws(t) 

s{t) 


+  v'  (*)• 


(9) 


From  this,  Cohen  has  defined  the  instantaneous  band¬ 
width  of  a  signal,  which  is  given  by  the  square-root  of 
the  conditional  variance, 


=  (^)t  ~  M?  = 


2 


(10) 


Note  that  Cohen’s  method  gives  an  instantaneous  vari¬ 
ance  (and  bandwidth)  that  is  positive,  which  is  neces¬ 
sary  for  a  proper  interpretation.  Also  we  re-write  Co¬ 
hen’s  instantaneous  bandwidth  expression  equivalently 
as 


#&(*)  =  = 


(W  -  (u)t)s(t) 
s(t) 


(11) 


to  highlight  the  fact  that  it  is  a  central  moment.  This 
form  will  be  convenient  in  the  next  section  where  we 
derive  the  instantaneous  spectral  kurtosis,  which  is  a 
fourth  order  central  moment,  of  the  signal. 

3.  HIGHER  INSTANTANEOUS  SPECTRAL 
MOMENTS 


We  begin  by  presenting  an  identity  that  will  be  fun¬ 
damental  to  our  derivations,  and  which  is  a  general¬ 
ization  of  the  central  moment  expression  above  for  in¬ 
stantaneous  bandwidth.  Specifically,  it  can  be  shown 
that  [12] 


(>V  -  ip'(t))n  s(t)  =  (-j)nA^(t)e^,  (12) 


In  the  method  of  Cohen  [6,7],  the  average  value 
of  u>  at  a  given  time  is  obtained  from  the  real  part 
(recall  that  the  operator  is  Hermitian,  and  therefore 
the  imaginary  term  integrates,  per  eq.  (5),  to  zero), 


which  is  the  instantaneous  frequency.  Substituting  this 
result  into  (5)  gives  the  well-known  result  derived  by 
Ville  [18],  namely  that  the  time-average  of  the  instan¬ 
taneous  frequency  equals  the  global  average  frequency. 
Hence  the  interpretation  of  the  (real  part  of  the)  inte¬ 
grand  in  (5)  as  the  instantaneous  (first)  moment  of  the 
signal  (times  the  magnitude-square  of  the  signal). 

1  Throughout  the  paper,  the  signal  is  normalized  to  unit- 
energy. 


where  A(n\t)  denotes  {~)nA(t). 

The  proof  follows  by  induction.  First,  we  show  that 
the  above  relation  holds  for  n  =  l, 

(W  -  <p'(t))  s(t)  =  Ws{t)  -  <p'(t)s(t) 

=  -jA'(t)eW®.  (13) 

Given  that  the  identity  holds  for  n  =  1,  we  proceed 
with  the  induction  proof  by  assuming  it  holds  for  some 
n>  1,  and  then  show  that  it  holds  for  n+1,  as  follows: 

(W-y>'(t))n+1s(f) 

=  (W-<p'(t))  [(-j)nA^(t)e^' 
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=  (~j)n+1jt  (^(n)WewW) 

-(-j)V(tM(n)(*)e»(t) 

=  (-j)n+1A^n+1\t)e3^t\  (14) 

which  completes  the  proof.  We  now  derive  expressions 
for  the  instantaneous  moments  of  a  signal. 

3.1.  Non-Central  Moments 

Like  Cohen’s  instantaneous  bandwidth  above,  higher- 
order  even  moments  must  be  positive  for  a  proper  in¬ 
terpretation.  By  the  operator  procedure,  the  fourth 
order  global  moment  may  be  written  as 


moment,  as  is  necessary;  however,  there  is  no  constraint 
like  positivity  that  we  can  use  to  specify  a  unique  in¬ 
stantaneous  moment.  So  for  example,  the  possible  third- 
order  moments  are  given  by  the  integrands  of  the  fol¬ 
lowing  expression: 

<w3>  =  J  s*{t)Wh(t)dt  =  j  ( Ws(t))*W2s{t)dt . 

(19) 

However,  this  ambiguity  is  resolved  when  we  consider 
the  central  instantaneous  moments,  which  we  do  next. 

3.2.  Central  Moments 


(w4)  =  J  s*(f)W4s(f)dt  =  j  (Ws(t))*  W3s(t)dt 
=  J  ( W2s(t))*W2s(t)dt ,  (15) 


where  we  have  used  operator  algebra  to  manipulate 
the  integrands.  We  note  that  the  three  integrals  are 
all  equivalent.  However,  because  the  frequency  opera¬ 
tor  does  not  commute  with  time,  the  three  integrands 
are  different,  giving  us  (at  least)  three  possible  expres¬ 
sions  for  the  fourth-order  instantaneous  moment,  the 
averages  of  which  all  give  the  correct  global  moment, 
as  is  required.  Only  the  integrand  of  the  latter  expres¬ 
sion  is  positive,  and  thus  we  define  the  fourth-order 
instantaneous  spectral  moment  as  the  integrand  of  the 
equation, 


(w4>  =  J  (W2s(t))*W2s(t)dt  =  J 


W2s(t) 

s(t) 


s(f)|2  dt. 
(16) 


Specifically,  we  have  that  the  instantaneous  fourth  or¬ 
der  spectral  moment  is  given  by, 


This  approach  generalizes  to  higher-order  even  moments 
as  [11,16], 


(w2m)e 


Wms(f)  2 
s(t) 


(18) 


Higher-order  odd  moments  present  an  added  chal¬ 
lenge  because,  as  with  the  even-order  moments  there 
are  many  different  operator  expressions  that  are  pos¬ 
sible,  all  of  which  integrate  to  give  the  correct  global 


The  instantaneous  central  moments  are  obtained  by 
replacing  the  operator  W  by  W— (w)t  in  the  expressions 
for  the  non-central  moments  above  [11,16].  Doing  so, 
it  follows  directly  from  eq.  (18)  that  the  even-order 
2m-th  central  instantaneous  moment  may  be  written 
as, 

(w  -  (u>)tr  s(t) 2 

s(t)  •  (20) 

For  the  odd-order  moments,  we  make  use  of  the 
identity  given  in  eq.  (12),  and  the  fact  that  the  odd 
moments  are  obtained  from  the  real  part  of  the  expres¬ 
sion  (analogous  to  the  case  for  instantaneous  frequency, 
which  is  a  first  order  moment).  It  therefore  follows 
immediately  that  the  odd-order  central  instantaneous 
moments  are  zero,  since  for  pdd  n,  eq.  (12)  is  purely 
imaginary.  For  example,  the  third-order  instantaneous 
central  moment  is  given  by, 


which  follows  from  the  integrands  of  eq.  (19)  by  sub¬ 
stituting  W—(u>)t  for  W  and  writing  them  in  the  form 
of  (21)  times  |s(£)|2  .  Since  we  take  the  real  part,  all  of 
these  expressions  evaluate  to  zero. 

The  fact  that  odd  order  central  moments  are  iden¬ 
tically  zero  by  this  method  fixes  the  odd  order  non¬ 
central  moments.  In  particular,  we  may  use  the  bino¬ 
mial  expansion,  (o  +  b)n  =  J2k=o  to  express 
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the  central  moments  of  the  time-frequency  distribution 
(eq.  (2))  in  terms  of  the  non-central  moments.  For 
example,  for  n  =  3,  we  have  that  the  third-order  non¬ 
central  instantaneous  moment  is  given  by, 

<w3>t  =  lZ(t)  +  3(w2)t(a)>t  -  2(«)f .  (23) 

Since  nl(t)  =0  and  (u2)t  and  (u>)t  are  unique,  we 
obtain  a  unique  third-order  instantaneous  non-central 
moment, 

<cA  =  3(^)Vw+^3W-  (24) 

Similarly,  the  higher-order  non-central  odd  moments 
can  be  uniquely  expressed  in  terms  of  the  central  mo¬ 
ment  and  lower-order  moments. 

From  these  results,  we  have  immediately  that  the 
“instantaneous  spectral  skew,”  which  we  define  anal¬ 
ogously  to  its  definition  in  ordinary  densities,  namely 
as  a  ratio  of  the  third  central  moment  to  the  second 
central  moment,  is  zero, 


This  result  can  be  viewed  as  a  generalization  of  the 
(global)  spectral  skew  of  the  signal  A(i)eJ"°‘,  which 
is  zero  since  the  spectrum  is  symmetric  about  the  fre¬ 
quency  wo-  Zero  instantaneous  spectral  skew  would  oc¬ 
cur  if  the  instantaneous  spectrum  was  symmetric  about 
the  instantaneous  frequency  v'(t). 

The  instantaneous  kurtosis,  which  we  again  define 
analogously  to  its  definition  in  ordinary  probability, 
namely  as  a  ratio  of  the  fourth  central  moment  to  the 
second  central  moment,  is  given  by 


Ku(t)  = 


fM 

o*J(t))a 

(W  -  (u)t)2s(t)  2  /  (W  -  (w)Mt) 
s(t)  /  s(t) 

s(t)  (W  —  (w)t)2  s(i)j2 

|(W  -  (<*>)<)  s(f)|4 


4 


(A(t)A"(t))2 

A'\t) 


(26) 


which  follows  from  eqs.  (11)  and  (20). 

4.  EXAMPLES 

4.1.  Identical  Low-Order  Moments  for  Differ¬ 
ent  Signals 

Just  as  different  densities  can  have  identical  mean  and 
variance,  it  is  possible  that  two  different  signals  can 


have  the  same  instantaneous  frequency  and  bandwidth — 
but  will  have  different  instantaneous  kurtosis.  For  ex¬ 
ample,  consider  the  signals  si  (t)  =  A\  (f)eJV5(i)  and  s2  (t)  = 
A2{t)e:i^t\  where  A2{t)  =  and  ip(t)  is  an  arbitrary 
phase  function.  The  signals  si(t)  and  s2(t)  obviously 
have  identical  instantaneous  frequency.  They  also  have 
identical  instantaneous  bandwidth  since 


(27) 


The  instantaneous  kurtosis  (eq.  (26))  is,  however,  dif¬ 
ferent  for  each  signal;  in  particular,  the  fourth  central 
moment  for  s2(t)  is,  by  eqs.  (20)  and  (12), 


Thus,  the  time- varying  spectral  differences  between  these 
two  signals  are  reflected  in  the  higher  instantaneous 
spectral  moments. 


4.2.  Positive  Time-Frequency  Density  with  Pre¬ 
scribed  Conditional  Moments 

It  is  possible  to  construct  TFDs  that  yield  these  mo¬ 
ments.  As  an  example,  consider  the  sinusoidal  FM  sig¬ 
nal  with  Gaussian  amplitude, 

^Ye-8(t-0.5)2+j(30wta+287rt-3sin(67rt))  ^9) 

We  employ  a  moment  constrained  weighted  least-squares 
(WLS)  algorithm  [13]  to  construct  a  positive  time-frequency 
density  (TFD)  [10]  which  gives  eqs.  (8),  (9),  (24)  and 
(17)  as  its  first-  through  fourth-order  conditional  mo¬ 
ments.  The  resulting  TFD  is  shown  in  figure  1.  Figure 
2  shows  the  conditional  moments  of  the  TFD  (solid) 
plotted  against  the  proposed  moments  (dashed). 

5.  CONCLUSION 

We  have  given  expressions  for  the  instantaneous  spec¬ 
tral  moments  of  a  signal,  which  are  generalizations  of 
the  ideas  of  instantaneous  frequency  and  instantaneous 
bandwidth.  A  simple,  fundamental  relationship  be¬ 
tween  the  central  instantaneous  moments  and  the  am¬ 
plitude  of  a  signal  was  given,  from  which  one  can  then 
obtain  specific  moments,  such  as  the  instantaneous  skew 
and  the  instantaneous  kurtosis.  As  with  instantaneous 
frequency  and  instantaneous  bandwidth,  the  instanta¬ 
neous  skew  and  kurtosis  may  be  viewed  as  conditional 
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spectral  moments  of  the  time-varying  spectral  density 
of  the  signal.  Having  an  expression  for  the  higher  con¬ 
ditional  moments  allows  us  to  construct  the  density, 
and  to  differentiate  between  signals  with  the  same  in¬ 
stantaneous  frequency  and  bandwidth.  We  speculate 
that,  as  in  other  areas  where  higher  moments  have  been 
found  to  be  useful,  the  same  may  hold  true  for  the  in¬ 
stantaneous  kurtosis  and  higher  moments  introduced 
here  for  time- varying,  or  nonstationary,  signals. 


Positive  Time-Frequency  Density 


Time  [sec] 


Figure  1:  Positive  TFD  constrained  to  yield  marginals 
and  operator-derived  moments  for  sinusoidal  FM  signal 
in  (29).  Side  panel:  frequency  marginal.  Bottom  panel: 
time  marginal. 


<“>,  <»*>, 


Figure  2:  First-  through  fourth-order  conditional  mo¬ 
ments  of  TFD  in  figure  1  (solid)  constrained  to  yield 
the  operator-derived  moments  (dashed). 
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ABSTRACT 

We  propose  an  adaptive  quadratic  time-frequency  represen¬ 
tation  (QTFR)  based  on  a  matching  pursuit  signal  decom¬ 
position  that  uses  a  dictionary  with  elements  matched  to 
the  instantaneous  frequency  of  the  analysis  signal  compo¬ 
nents.  We  form  the  QTFR  as  a  weighted  linear  superposi¬ 
tion  of  QTFRs  chosen  by  the  algorithm  to  provide  a  highly 
localized  representation  for  each  of  the  adaptively  selected 
dictionary  elements.  This  is  advantageous  as  the  resulting 
representations  are  parsimonious  and  reduce  the  effect  of 
cross  terms.  Also,  they  exhibit  maximum  time-frequency  lo¬ 
calization  for  the  difficult  analysis  case  of  signals  with  mul¬ 
tiple  components  that  have  different  time-frequency  char¬ 
acteristics.  Thus,  the  new  technique  can  be  used  to  analyze 
and  classify  multi-structure  signal  components  as  demon¬ 
strated  by  our  synthetic  and  real  data  simulation  examples. 

1.  INTRODUCTION 

Nonstationary  signals  have  been  successfully  analyzed  using 
quadratic  time-frequency  representations  (QTFRs)  as  they 
provide  important  information  on  the  signals’  time-varying 
characteristics  [1-3].  Many  QTFRs  are  ideally  matched  to 
one  or  two  time-frequency  (TF)  structures  based  on  the 
properties  they  satisfy.  For  example,  Cohen’s  class  QTFRs 
with  signal-independent  kernels  [1,3]  are  matched  to  signals 
with  linear  TF  characteristics  as  they  preserve  the  signal’s 
constant  TF  shifts.  Hyperbolic  or  power  QTFRs  [4, 5]  are 
matched  to  signals  with  non-linear  (dispersive)  structures 
as  they  preserve  dispersive  time  shifts.  For  successful  TF 
analysis,  it  is  important  to  match  a  QTFR  with  the  TF 
structure  of  a  signal.  However,  it  is  possible  that  a  signal 
(for  example,  biological  or  sonar  data)  has  multiple  compo¬ 
nents  -with  distinctively  different  TF  structures.  This  com¬ 
plicates  TF  analysis  due  to  the  presence  of  cross  terms  or 
the  effect  of  smoothing  [3]  that  may  impede  interpretation. 

Some  QTFRs  used  for  the  analysis  of  signals  with  mul¬ 
tiple  TF  structures  include  the  spectrogram  [3],  reassigned 
QTFRs  [6],  and  various  adaptive  QTFRs  [7].  Although  they 
work  well  in  many  applications,  these  QTFRs  are  not  de¬ 
signed  to  yield  the  exact  instantaneous  frequency  (IF)  of  a 
signal  for  classification.  Also,  they  may  not  provide  a  well- 
localized  representation  without  cross  terms  for  analyzing 
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non-linear  signal  structures.  Thus,  it  is  very  advantageous 
to  design  an  adaptive  QTFR  to  exactly  match  various  sig¬ 
nal  components  since  many  natural  or  synthetic  signals  may 
have  different  linear  or  non-linear  TF  structures.  In  this  pa¬ 
per,  we  propose  a  new  QTFR  that  adapts  to  different  TF 
signal  structures  based  on  a  matching  pursuit  algorithm. 

2.  BACKGROUND  AND  OBJECTIVES 

The  matching  pursuit  iterative  algorithm  of  Mallat  and 
Zhang  decomposes  a  signal  into  a  linear  expansion  of  wave¬ 
forms  selected  from  a  redundant  and  complete  dictionary 
[8].  It  uses  successive  approximations  of  the  signal  with 
orthogonal  projections  on  dictionary  elements.  The  dictio¬ 
nary  consists  of  a  basic  Gaussian  atom  that  is  TF  shifted 
and  scaled.  A  QTFR  (called  the  modified  Wigner  distribu¬ 
tion  in  [8])  is  obtained  as  a  weighted  superposition  of  the 
Wigner  distribution  (WD)  [1 — 3]  of  each  selected  element. 
This  QTFR  is  free  of  cross  terms,  and  preserves  signal  en¬ 
ergy,  TF  shifts,  and  scale  changes  on  the  analysis  signal. 
It  is  also  similar  to  a  QTFR  obtained  in  [9].  When  a  sig¬ 
nal  has  multiple  components  with  different  TF  structures, 
the  QTFR  uses  many  Gaussian  elements  to  approximate 
the  IF  of  each  signal  component.  In  order  to  analyze  lin¬ 
ear  frequency-modulated  (FM)  chirps  more  efficiently  with 
fewer  waveforms,  rotated  Gaussian  atoms  were  included  in 
the  dictionary  in  [10].  On  the  other  hand,  a  wave-based 
dictionary  consisting  of  wavefronts,  resonances,  and  linear 
FM  chirps  was  used  to  process  scattering  data  in  [11]. 

We  propose  to  use  a  matching  pursuit  with  dictionary 
elements  that  axe  matched  to  the  constant,  linear,  or  non¬ 
linear  TF  structure  of  a  signal.  These  waveforms  include 
complex  sinusoids  with  linear  or  non-linear  phase  function 
such  as  logarithmic  and  power.  Our  aim  is  to  analyze  sig¬ 
nals  that  have  multiple  IF  structures  such  as  the  differ¬ 
ent  characteristic  signature  whistles  from  a  group  of  dol¬ 
phins  [12],  and  various  biomedical  signals  measured  simul¬ 
taneously.  The  advantage  of  using  a  dictionary  that  is 
matched  to  the  analysis  data  is  that  only  a  small  number 
of  elements  will  be  used  to  decompose  the  signal,  and  the 
algorithm  is  expected  to  give  fast  and  parsimonious  results. 
At  each  iteration  of  the  matching  pursuit,  we  will  adaptively 
choose  the  best  dictionary  element,  identify  its  TF  struc¬ 
ture,  and  compute  its  corresponding  QTFR.  The  resulting 
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proposed  QTFR  is  formed  as  a  weighted  linear  superpo¬ 
sition  of  QTFRs  that  were  each  chosen  to  appropriately 
match  a  selected  dictionary  element.  Such  an  algorithm 
not  only  designs  a  QTFR  that  is  adapted  to  multiple  signal 
structures,  but  can  also  be  used  in  classification,  detection, 
and  identification  applications  for  signals  with  specific  lin¬ 
ear  and/or  non-linear  (dispersive)  IF  components. 

3.  ADAPTIVE  ANALYSIS  OF  SIGNALS  WITH 

MULTIPLE  TF  STRUCTURES 

3.1.  Adaptive  representation 

Although  the  matching  pursuit  algorithm  in  [8]  works  well 
for  many  signals,  we  will  show  that  it  uses  many  Gaussian 
atoms  to  represent  a  signal  component  with  non-linear  TF 
characteristics.  In  addition,  the  modified  WD  is  not  very 
localized  along  the  non-linear  IF  of  the  signal  components. 

In  this  paper,  we  modify  the  original  matching  pursuit 
in  order  to  decrease  the  number  of  algorithm  iterations, 
to  improve  the  TF  localization  in  the  analysis  of  different 
multiple  non-linear  FMs,  and  to  correctly  classify  the  IF  of 
each  signal  component.  We  follow  the  same  basic  concept  of 
the  matching  pursuit  algorithm  in  [8],  but  with  some  major 
differences.  The  first  major  difference  is  that  we  use  more 
than  one  type  of  basic  atom  in  our  dictionary.  Particularly, 
the  dictionary  consists  of  a  large  class  of  different  basic 
atoms  each  of  which  has  the  form  of  a  non-linear  FM  chirp 

*(*;«.  A)  =  >/Rt)ie'2,rX<(*>  (i) 

which  is  uniquely  specified  by  its  FM  rate  A  and  its  mono¬ 
tonic  phase  function  ((b).  Note  that  u(t)  =  £((■£-)  is  the 
IF  of  the  signal  in  (1),  and  tT  >  0  is  a  reference  time.  The 
dictionary  may  contain  one  type  of  FM  chirp  with  fixed  ( (6) 
in  (1)  or  a  linear  combination  of  them  including  sinusoids 
with  ((b)  =  b,  linear  FM  chirps  with  ((b)  =  sgn(6)|6|2, 
hyperbolic  FM  chirps  with  ((b)  =  In  6,  power  FM  chirps 
with  ((b)  =  sgn(6)|6|'t,  and  exponential  FM  chirps  with 
((b)  =  eb .  The  dictionary  is  formed  by  transforming  the 
non-linear  FM  chirp1  in  (1)  as 

git)  (A>0)  =  (SrCagc9((,\)\(t) 

=  y/\ a  v(a(t  -  t))|  ei2n  <c+A)«o(Tjr» ,  (2) 

with  the  parameter  vector  0  =  [c,  a,  r]  e  ©  =  R3.  The  uni¬ 
tary  operators  Qe ,  Ca,  and  ST  result  in  a  constant  FM  rate 
shift  c,  scale  change  a,  and  constant  time  shift  t,  respec¬ 
tively,  of  the  FM  chirp.  Specifically,  the  operators  trans¬ 
form  a  signal  x(t)  as  (Qcx)(t)  =  a5(i)e',2’rc*(‘T),  ( Cax)(t )  = 
\/H  x(at),  and  (STx)(t)  =  x(t  —  r).  Note  that  in  (2), 
we  use  a  transformation  that  results  in  a  constant  shift 
(from  A  to  c  +  A)  of  the  FM  rate  of  the  non-linear  FM 
chirp  instead  of  a  constant  frequency  shift  as  in  [8].  This 
is  because  we  are  considering  signals  that  may  be  wide¬ 
band  as  well  as  dispersive,  thus  a  shift  of  the  IF  is  a  better 
matched  transformation  to  cover  the  entire  TF  plane  [4]. 

1Without  loss  of  generality,  the  atom  in  (1)  may  use  A  =  1. 


With  appropriate  normalization,  we  restrict  the  energy  of 
g (t;  (,  A,  0)  to  be  unity  for  every  0  in  order  to  ensure  en¬ 
ergy  preservation  when  ((b)  is  fixed  [8,  13).  The  itera¬ 
tive  procedure  of  the  matching  pursuit  first  projects  the 
analysis  signal  x(t )  =  (Rox)(t)  onto  each  element  of  the 
dictionary,  and  selects2  g(t,( o,A,go)  based  on  the  condi¬ 
tioner,  g((o,\,0o))\  >  |< *,  <?(£,  A,fi)  >| ,  V0  €  0  and 
for  all  possible  phase  functions  ((b)  of  the  elements  used 
to  form  the  dictionary.  This  ensures  that  the  element  with 
the  highest  energy  will  be  chosen  first.  This  results  in  the 
signal  decomposition 

x(t)  =  0og(t-,(o,\,Oo)  +  (Rixm  (3) 

with  the  expansion  coefficient  fio  —  (x,  g(( o,  A,  00) ).  The 
function  (o(b)  corresponds  to  the  phase  function  of  this  first, 
highest  energy  signed  component  of  the  analysis  signal.  For 
example,  if  the  first  dictionary  element  chosen  is  a  hyper¬ 
bolic  FM  chirp,  then  (o(b)  =  In  b. 

The  second  major  difference  of  our  algorithm  from  the 
one  in  [8-10]  is  that  we  do  not  compute  the  WD  of  each  se¬ 
lected  element  to  form  the  modified  WD.  Instead,  we  adap¬ 
tively  use  the  information  that  the  first  selected  waveform 
has  phase  function  £o(f>)  in  order  to  compute  its  generalized 
warped  Wigner  distribution  (GWD)  [5, 14].  The  GWD  is 
a  warped  version  of  the  WD,  with  the  warping  [5, 14, 15] 
based  on  a  monotonic  and  (possibly)  non-linear  parameter 
function  ((b).  In  particular, 

GWD.,,./^).™.^!),^)  (4) 

where  p(t)  —  ^-t((j^))  and  the  warped  signal  is  [5] 

y(t)  =  (W<x)(t)  =  tMtr(-l(l-  W^xitrC'ir ))  ■ 

lr  tr 

Note  that  a  specific  GWD  is  obtained  simply  by  fixing 
its  parameter  function  ((b).  By  matching  ((b)  in  (4)  to 
be  equal  to  the  phase  function  (0(b)  in  (3)  (i.e.  if  ((b)  = 
&>(&)),  our  new  adaptive  representation  for  multiple  struc¬ 
tures  (ARMUS)  QTFR,  at  this  first  iteration,  is  simply 

T°(t,  /)  =  | A) | 2  GWDs(fo,«o)(f,  /;  (o). 

At  the  second  iteration,  the  residual  function  (Rix)(t)  is 
obtained  by  solving  (3),  and  it  is  decomposed  in  a  similar 
manner  to  the  signal  x(t).  At  the  (n  +  l)th  iteration,  the 
criterion 

\(Rn,9(U9n))\  >  \(Rn,g((,0))\,  V0G0  (5) 

is  used  to  decompose  the  nth  residual  function  (Rn  x)(t)  as 
(Rn  x)(t)  =  fing(t\  (n,0n)  +  (R(n+ 1)  x)(t)  where 

fin  =  ( RnX ,  g((n,0n ))  (6) 

is  the  expansion  coefficient.  The  GWD  of  (Rnx)(t)  is  also 
obtained  adaptively  to  match  the  TF  structure  of  the  nth 
residual  function  by  letting  ((b)  =  (n(b)  in  (4). 

2Note  that  a  subscript  n  in  the  parameters  Rn,  (n(b),  0n, 
rn,  and  c, j,  and  a  superscript  n  in  a  QTFR  T”(t,  /)  indicate  the 
algorithm  parameters  at  the  (n  +  l)th  iteration. 

3The  inner  product  is  defined  as  (x,  g)  =  x(t)  g*(t)dt. 
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After  a  total  of  N  iterations,  the  matching  pursuit  al¬ 
gorithm  results  in  the  signal  decomposition 

N- 1 

xi*)  =  •*’->») + (Rn  ■  (7) 

n—  0 

Note  that  since  our  dictionary  is  complete,4  any  signal  x(t) 
can  be  represented  as  in  (7)  with  N  =  oo,  in  which  case 
(Rn  x)(t)  —  0  [8].  In  actuality,  when  the  signal  compo¬ 
nents  match  the  TF  structure  of  the  dictionary  elements, 
the  algorithm  converges  quickly.  As  stopping  criteria,  we 
use  a  maximum  number  of  iterations,  and  an  acceptable 
small  residue  energy  compared  to  the  data  energy  [8]. 

The  resulting  ARMUS  of  the  signal  at  the  Nth  itera¬ 
tion  is  the  weighted  sum  of  the  appropriate  GWDs  of  the 
dictionary  elements  selected  at  each  iteration.  Specifically, 

ARMUS  x(t,f)=TxN~l(t,f) 

=  X>„|2gwd ',«BlSn)(t,/;60,  (8) 

n= 0 

with  the  weights  |/?„|2  defined  in  (6).  Note  that  the  same 
GWD  (with  fixed  parameter  function  £n(&))  in  (8)  may  be 
used  if  multiple  chirps  have  the  same  phase  function  but 
different  FM  rate. 

3.2.  Decomposition  and  QTFR  properties 

An  important  property  of  the  matching  pursuit  in  (7)  is  its 
covariance  to  certain  signal  changes.  Consider  the  decom¬ 
posed  signal  x[t)  =  ]£“=o  Pn  g(t\  £,  A,  0n)  in  (7)  with  N=  oo, 
and  with  similar  TF  structure  dictionary  elements  (i.e.  let 
£„(&)  =£(6),  Vn).  If  the  FM  rate  of  a  non-linear  chirp  x(t) 
is  shifted  by  a  constant  amount  to  form  y(t)  =  (Gux){t)  = 
x(t)e?2*u^£\  then  its  matching  pursuit  is  simply  given 
as  t /(t)  =  £“=o  Pn9(t\  £,A,0J.  Note  that  the  expansion 
coefficients  /?„  are  not  affected  by  this  signal  change.  The 
parameter  vector  changes  to  Qn  —  [cn  +  u,an,Tn]  indicat¬ 
ing  that  the  time  shifts  rn  and  the  scale  changes  an  remain 
the  same,  whereas  the  dictionary  atoms  undergo  a  constant 
shift  in  their  FM  rate  from  (A  4-  cn)  to  (A  +  cn  +  u).  Note 
that  if  £(6)  is  a  power  or  a  logarithmic  function,  then  we 
can  show  that  the  corresponding  matching  pursuit  is  also 
covariant  to  scale  changes  [13]. 

The  ARMUS  QTFR  in  (8)  also  satisfies  various  prop¬ 
erties  that  are  desirable  in  many  applications.  By  simply 
combining  the  GWDs  of  each  selected  dictionary  element, 
no  cross  terms  are  introduced  in  the  QTFR.  Also,  it  pre¬ 
serves  the  underlying  TF  structure  of  each  analysis  signal 
component,  and  it  provides  a  highly  localized  representa¬ 
tion  of  each  component  as  it  does  not  apply  any  smoothing. 
Specifically,  the  GWD  with  parameter  £(6)  of  a  non-linear 
FM  chirp  with  IF  £(&)  results  in  the  highly  localized  repre¬ 
sentation  GWD9(£,x)(t,  /;  £)  =  \v(t)\S(f  -  Ai/(t))  [5].  If  a 
particular  application  uses  signal  components  with  only  one 
type  of  TF  structure,  then  we  should  form  our  dictionary 
using  the  corresponding  non-linear  FM  chirp  with  matched 
IF.  In  such  cases,  the  ARMUS  satisfies  other  desirable  sig¬ 
nal  properties  such  as  the  preservation  of  signal  energy,  and 

4The  proof  can  be  found  in  [13]. 


changes  in  the  analysis  signal’s  FM  rate  [13].  If  the  dictio¬ 
nary  elements  are  either  hyperbolic  or  power  FM  chirps, 
then  the  QTFR  also  preserves  scale  changes.  In  [9],  it  was 
shown  that  if  some  cross  terms  are  allowed  in  a  version  of 
the  modified  WD,  then  additional  signal  properties,  such 
as  the  marginals,  can  be  satisfied.  This  depends  on  a  dis¬ 
tance  measure  criterion  that  controls  the  amount  of  cross 
terms  included  in  the  QTFR  formulation.  We  are  currently 
investigating  the  corresponding  distance  measure  for  each 
different  dispersive  QTFR  function  £(&). 

3.3.  Implementation  issues 

As  we  vary  many  parameters  in  our  algorithm  in  order  to 
select  the  appropriate  dictionary  elements  for  the  matching 
pursuit,  the  computation  is  intensive.  However,  if  we  pre- 
process  our  data,  we  can  form  a  dictionary  with  elements 
which  approximately  span  the  data  in  TF  structure.  Thus, 
the  algorithm  iterates  more  rapidly.  Additional  speedup  is 
possible  if  we  compute  the  matched  GWD  of  each  dictionary 
element  ahead  of  time.  Since  the  last  operation  on  the  basic 
atoms  in  (2)  is  time  shifting,  we  perform  the  inner  products 
in  the  matching  pursuit  criterion  in  (5)  as  a  cross-correlation 
instead  of  introducing  another  layer  of  dictionary  elements 
over  all  possible  time  shifts.  Thus,  the  inner  products  in 
(5)  are  computed  as  correlations  between  the  residual  func¬ 
tions  and  the  dictionary  elements  that  have  been  general¬ 
ized  frequency-shifted  over  all  FM  rates  c  and  scaled  over 
all  a.  This  increases  the  computational  speed  since  correla¬ 
tions  can  be  implemented  using  the  fast  Fourier  transform 
(FFT).  Also,  the  memory  consumption  by  the  dictionary 
is  significantly  reduced  since  additional  dictionary  elements 
are  not  needed  for  every  time  shift.  Moreover,  since  the 
dictionary  elements  do  not  change,  and  the  residual  data 
are  constant  during  a  given  matching  pursuit  iteration,  ad¬ 
ditional  speedup  could  be  achieved  by  pre-computing  and 
storing  the  FFTs  of  these  sequences. 

If  the  signal  components  are  well-separated  in  time,  we 
can  use  the  algorithm  to  simply  find  the  time  support  and 
phase  function  of  each  selected  element,  and  then  use  the  in¬ 
formation  to  analyze  the  actual  data  (instead  of  the  selected 
waveforms)  with  its  matched  GWD.  This  will  greatly  reduce 
computation  as  only  a  few  GWDs  need  to  be  obtained.  If 
classification  is  needed  without  analysis,  the  algorithm  can 
provide  the  IF  of  each  signal  component  without  computing 
its  QTFR,  simply  by  extracting  that  information  from  the 
matched  dictionary  elements. 

3.4.  Simulation  examples 

Synthetic  data:  We  demonstrate  the  performance  of  our 
new  QTFR  by  first  analyzing  a  synthetic  signal  with  seven 
components:  two  windowed  hyperbolic  FM  chirps  and  five 
windowed  linear  FM  chirps  with  different  chirp  rates,  scal¬ 
ings,  and  time  shifts.  Their  “ideal”  TF  structure  obtained 
by  adding  the  IF  of  each  component  is  shown  in  Fig.  1(a). 
The  WD  in  Fig.  1(b)  suffers  from  cross  terms  and  makes 
it  difficult  to  identify  the  true  TF  structure  of  each  compo¬ 
nent.  On  the  other  hand,  the  spectrogram  [3]  in  Fig.  1(c) 
suffers  from  loss  of  resolution  due  to  smoothing  that  pro¬ 
hibits  signal  classification  and  the  identification  of  the  exact 
number  of  signed  terms. 
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Figure  1:  (a)  A  linear  combination  of  the  ideal  IF  of  each  component  of  a  signal  consisting  of  two  windowed  hyperbolic  FM 
chirps  and  five  windowed  linear  FM  chirps.  The  signal  is  analyzed  using  (b)  the  Wigner  distribution,  (c)  the  spectrogram, 
(d)  the  new  ARMUS  QTFR,  and  (e)  the  modified  WD  in  [8]. 


We  apply  our  method  by  decomposing  the  signal  using 
a  dictionary  of  linear  and  hyperbolic  FM  chirps  with  about 
89,000  combinations  of  FM  rate  changes,  scale  changes,  and 
time  shifts.  The  decomposition  approximates  the  data  very 
well  after  only  seven  iterations  (the  same  number  as  the  sig¬ 
nal  terms)  as  demonstrated  by  overlaying  the  signal  with 
its  expansion.  The  ARMUS  QTFR  in  Fig.  1(d)  provides 
a  highly  localized  representation  for  all  seven  components 
without  outer  cross  terms  or  loss  of  resolution.  This  is  be¬ 
cause  it  adaptively  computes  the  Altes  Q-distribution  [4] 
for  selected  elements  with  hyperbolic  TF  characteristics, 
and  the  WD  for  selected  elements  with  linear  TF  charac¬ 
teristics.  Note  that  the  mild  spreading  of  the  signal  com¬ 
ponents  is  due  to  the  fact  that  the  data  was  windowed  for 
processing.  For  further  comparison,  we  applied  the  match¬ 
ing  pursuit  from  [8]  with  Gaussian  dictionary  elements  to 
decompose  the  signal,  and  then  we  analyzed  it  using  the 
modified  WD  as  shown  in  Fig.  1(e).  Note  that  although 
the  QTFR  does  not  yield  any  cross  terms,  it  does  not  pro¬ 
vide  a  localized  representation  that  can  easily  identify  the 
TF  structure  of  the  components.  Also,  the  algorithm  does 
not  converge  with  as  many  as  fifty  iterations,  and  it  does 
not  provide  a  closed  form  estimate  of  the  IF  of  the  signal 
components  for  classification. 

Real  data:  We  use  our  matching  pursuit  algorithm  to  ob¬ 
tain  a  closed  form  estimate  of  the  IF  of  real  data  for  clas¬ 


sification.  The  analysis  data  consists  of  whistles5  from  a 
long-finned  pilot  whale.  In  Fig.  2(a),  the  spectrogram  of 
the  data  shows  three  whistles  with  dispersive  TF  charac¬ 
teristics  as  high  frequencies  are  time-delayed  by  a  shorter 
amount  than  low  ones.  Although  the  spectrogram  provides 
visual  information,  it  cannot  find  the  exact  IF  of  the  sig¬ 
nal  components.  Our  matching  pursuit  decomposition  of 
the  data  is  highly  localized  along  hyperbolic  TF  curves  as 
we  formed  our  algorithm  using  hyperbolic  FM  chirps.  This 
is  shown  by  plotting  the  sum  of  the  IFs  of  the  selected 
waveforms  in  Fig.  2(b).  Fig.  2(c)  shows  an  overlay  of  the 
plots  in  Figs.  2(a)  and  2(b)  for  a  fair  comparison.  Note 
that  based  on  the  spectrogram  analysis,  we  set  the  itera¬ 
tion  limit  to  three.  However,  the  algorithm  did  not  extract 
the  third  component  since  (i)  it  is  low  in  amplitude,  and  (ii) 
the  higher  frequency  component  is  not  exactly  hyperbolic, 
so  the  algorithm  keeps  trying  to  remove  that  component 
first.  On  the  other  hand,  the  matching  pursuit  provided  us 
with  a  closed  form  estimate  of  the  true  IF  of  the  two  louder 
whistles.  For  better  classification,  we  plan  to  increase  the 
number  of  iterations  as  well  as  include  in  our  dictionary 
both  hyperbolic  and  power  FM  chirps.  We  expect  to  obtain 
better  matched  results  since  the  IF  of  the  higher  frequency 
whistle  appears  to  be  a  power  function. 

5The  data  was  obtained  from  the  database  of  W.  Watkins  [16] 
at  the  Woods  Hole  Oceanographic  Institute. 
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Figure  2:  Analysis  of  real  data  whistles  from  a  long-finned  pilot  whale:  (a)  spectrogram,  (b)  sum  of  IFs  of  selected  waveforms 
from  a  matching  pursuit  with  a  hyperbolic  FM  chirp  dictionary  after  three  iterations,  (c)  an  overlay  of  the  first  two  plots. 


4.  CONCLUSION 

We  have  developed  an  adaptive  QTFE  based  on  the  match¬ 
ing  pursuit  algorithm  in  order  to  analyze  time-varying  sig¬ 
nals  that  have  multiple  components  with  different  (possi¬ 
bly  non-linear)  frequency  modulation.  We  build  our  dic¬ 
tionary  based  on  pre-processing  the  analysis  data  to  obtain 
some  general  information  on  the  TF  structure  of  the  sig¬ 
nal  components.  At  each  iteration,  we  adaptively  match 
the  selected  dictionary  element  with  the  matched  QTFR 
that  provides  its  most  localized  representation.  The  re¬ 
sulting  QTFE  of  the  analysis  signal  is  a  linear  superposi¬ 
tion  of  the  individual  QTFRs  of  the  elements,  weighted  by 
the  magnitude  squared  of  the  expansion  coefficients.  We 
have  demonstrated  with  simulated  examples  that  this  new 
QTFE  handles  well  the  difficult  problem  of  analyzing  sig¬ 
nals  with  multiple  IF  structures  in  TF  signal  processing 
without  introducing  cross  terms,  or  altering  the  underlying 
structure  of  each  signal  component,  or  suffering  from  a  loss 
of  resolution  due  to  smoothing. 
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ABSTRACT 

This  paper  presents  two  novel  results  which  are  significant  for  the 
application  of  time-frequency  signal  analysis  techniques  to  real  life 
signals.  First,  we  introduce  a  measure  for  comparing  the  resolu¬ 
tion  performance  of  TFDs  in  separating  closely  spaced  compo¬ 
nents  in  the  time-frequency  domain.  The  measure  takes  into  ac¬ 
count  key  attributes  of  TFDs  such  as  main-lobes,  side-lobes,  and 
cross-terms.  The  introduction  of  this  measure  is  an  improvement 
of  current  techniques  which  rely  on  visual  inspection  of  plots. 

The  second  result  consists  in  proposing  a  methodology  for 
designing  high  resolution  quadratic  TFDs  for  the  time-frequency 
analysis  of  multicomponent  signals  when  components  are  close  to 
each  other.  A  recently  introduced  TFD,  the  B-distribution,  and  its 
modified  version  are  defined  using  this  methodology. 

Finally,  the  performance  comparison  of  quadratic  TFDs  us¬ 
ing  the  proposed  resolution  measure  shows  that  the  B-distribution 
outperforms  existing  quadratic  TFDs  in  resolving  closely  spaced 
components  in  the  time-frequency  domain. 

1.  INTRODUCTION 

This  paper  describes  what  we  believe  is  the  first  attempt  at  pro¬ 
viding  an  objective  quantitative  measure  criterion  for  comparing 
the  performance  of  quadratic  time-frequency  distributions  (TFDs), 
in  terms  of  resolution  (separation  of  closely  spaced  components), 
when  applied  to  the  analysis  of  multicomponent  signals. 

Let  us  consider  a  multicomponent  signal  given  by: 

s(t)  =  Si(<)  +  s2(t)  (1) 

where  si(t)  and  S2 (<)  are  two  parallel  linear  frequency  modu¬ 
lated  (LFM)  signals  of  length  N  =  128  and  sampling  frequency 
fs  =  \Hz.  The  frequency  of  the  first  component  si  (<)  goes  from 
0.15 Hz  to  0.25 Hz,  while  the  frequency  of  the  second  component 
s2{t)  varies  from  0.2 Hz  to  0.3 Hz. 

The  multicomponent  signal  s(t)  is  represented  in  the  time- 
frequency  domain  using  the  Wigner-Ville  distribution  (WVD),  the 
spectrogram,  the  Choi-Williams  distribution  (CWD)  [1],  the  Bom- 
Jordan  distribution  [2],  Zhao-Atlas-Marks  (ZAM)  distribution  [3], 
and  the  recently  introduced  B-distribution  [4,  5]  (see  Figure  1). 

The  desire  to  objectively  compare  the  plots  in  Figure  1  mo¬ 
tivated  the  need  to  define  a  quantitative  performance  measure  for 
TFDs.  The  characteristics  of  TFDs  that  influence  their  resolution, 
such  as  energy  concentration,  mainlobes  separation,  sidelobes  and 
cross-terms  minimisation,  are  combined  to  define  a  quantitative 
measure  criterion. 


(c)  CWD  (<r  =  2)  (d)  Bom-Jordan 


Figure  1:  TFDs  of  two  LFMs  with, frequency  /i  =  0.15  -  0.25  Hz 
and  f2  =  0.2  —  0.3 Hz.  All  plots  use  a  rectangular  window,  apart 
from  the  spectrogram  which  uses  the  Hanning  window 


This  paper  presents  a  comparison  of  the  resolution  performance 
of  the  above  mentioned  TFDs,  using  the  newly  proposed  measure 
criterion.  In  this  context,  we  show  that  the  B-distribution  out¬ 
performs  the  other  quadratic  TFDs  for  signals  with  components 
closely-spaced  in  the  time-frequency  plane. 
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2.  PERFORMANCE  CRITERIA  OF  TIME-FREQUENCY 
DISTRIBUTIONS 

2.1.  Monocomponent  Signal 

The  performance  of  a  TFD  in  the  case  of  monocomponent  FM  sig¬ 
nals  is  commonly  defined  in  terms  of  the  energy  concentration  the 
TFD  achieves  about  the  signal  instantaneous  frequency  (IF)  [6]. 
For  the  slice  of  TFD  taken  at  the  time  instant  f0,  illustrated  in  Fig¬ 
ure  2,  we  may  express  the  performance  measure  as: 


•  its  1.5  dB  mainlobe  bandwidth  relative  to  /  is  the  smallest 
compared  to  that  of  other  distributions,  and  if 

•  it  yields  the  smallest  sidelobe  magnitude  to  mainlobe  mag¬ 
nitude  ratio  compared  to  those  of  other  distributions. 

2.2.2.  Resolution 

The  frequency  resolution  in  a  power  spectral  estimate  of  a  signal 
composed  of  two  single  tones,  f\  and  /2,  is  defined  as  the  mini¬ 
mum  difference  /2  -  fi  for  which  the  following  inequality  holds: 


_  \AS\  V 
P  \Am\  f 


(2) 


where  Am  is  the  amplitude  of  the  mainlobe  of  the  TFD,  As  is 
the  amplitude  of  the  sidelobes,  V  is  the  1 .5  dB  bandwidth1  of  the 
mainlobe  and  /  represents  the  IF  of  the  signal,  all  taken  at  time 
to.  The  rationale  for  introducing  (2)  is  that  one  wants  to  minimise 
sidelobe  amplitude  As  and  mainlobe  bandwidth  V  relative  to  cen¬ 
tral  frequency  /,  but  maximise  mainlobe  amplitude  Am  ■ 


f  i  + 


Yl 

2 


(3) 


where  Vi  and  V2  are  the  1.5  dB  mainlobe  bandwidth  of  the  first 
and  the  second  sinusoid,  respectively,  as  illustrated  in  Figure  3. 


For  a  time-frequency  distribution  pz  ( t ,  f)  of  a  two-component  sig¬ 
nal,  the  above  definition  of  resolution  would  be  valid  for  every 
slice  of  cross-terms  free  TFDs,  such  as  the  spectrogram,  taken  at 
time  t  =  to .  However,  for  TFDs  with  cross-terms,  we  need  to  ac¬ 
count  for  the  effect  of  cross-terms  on  resolution,  as  illustrated  by 
Figure  4  and  explained  in  the  next  section. 


Figure  2:  Slice  of  a  TFD  of  a  monocomponent  signal  taken  at  the 
time  instant  t  =  t0 


2.2.  Multicomponent  Signal 

The  performance  of  time-frequency  distributions  of  a  multicompo¬ 
nent  FM  signal,  can  be  quantitatively  measured  in  terms  of: 

•  the  energy  concentration  of  the  distribution  about  the  re¬ 
spective  instantaneous  frequency  of  each  component,  as  ex¬ 
pressed  by  equation  (2),  and 

•  the  resolution  as  measured  by  the  separation  of  the  main- 
lobes  of  the  components  in  the  time-frequency  plane,  and 
the  effect  of  cross-terms. 


2.2.1.  Energy  Concentration 

By  extending  the  concept  in  Section  2.1,  a  TFD  is  said  to  have  the 
best  energy  concentration  for  a  given  multicomponent  FM  signal 
if  for  each  of  the  signal  components: 

1  We  measure  the  bandwidth  of  the  mainlobe  of  a  component  at  the  rms 
value  of  the  component  normalised  amplitude.  See  also  footnote  5. 


Figure  4:  Slice  of  a  TFD  of  a  Mo-component  signal  taken  at  time 
t  =  to 

In  Figure  4,  Vi  (to),  /i(to),  As1(to)  and  AM1(to)  represent 
respectively  the  1.5  dB  mainlobe  bandwidth,  the  instantaneous  fre¬ 
quency,  the  sidelobe  amplitude  and  the  mainlobe  amplitude  of  the 
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first  component  at  time  t  =  to.  Similarly,  V2(to),  /2(fo),  As2(<o) 
and  AM2{to)  represent  the  1.5  dB  mainlobe  bandwidth,  the  in¬ 
stantaneous  frequency,  the  sidelobe  amplitude  and  the  mainlobe 
amplitude  of  the  second  component  at  the  same  time  to-  Ax  {to) 
defines  the  cross-terms  amplitude. 


For  g{ v,  t )  =  1,  we  obtain  the  Wigner-Ville  distribution  (WVD) 
of  the  signal  [2, 9]: 

poo 

WVDz(t,  f)  =  /  z(t+  I )z*(t  -  T-)e-i2^TdT  (7) 


2.2.3.  Resolution  Performance  Measure  ofTFDs 


Equation  (3)  and  Figure  4  suggest  that  the  resolution  performance 
of  a  time-frequency  distribution  of  a  two-component  signal  is  given 
by  the  minimum  value  of  the  difference  R  —  fc  —  fi  for  which  we 
still  have  a  positive  separation  D  between  the  components’  main- 
lobes  about  their  respective  IFs,  /2  and  f\.  For  TFDs,  D  should 
ideally  be  as  close  as  possible  to  the  true  difference  between  the 
actual  frequencies.  It  is  expressed  as: 


D  =  ^ 2  ~  ^  ~  ill  +  ^  _  i  _  Yl  +  V2  (A) 
h-h  2  R  K  1 

The  resolution  also  depends  on  the  following  set  of  variables,  all 
of  which  should  be  as  small  as  possible: 

a)  the  1.5  dB  normalised  mainlobe  bandwidth  of  the  signal 
component  14  /fk,k  =  1,2,  which  is  already  included  in 
D  (equation  (4)), 

b)  the  ratio  of  the  sidelobe  magnitude  |AsJ  to  the  mainlobe 
magnitude  \  AMk  |,  k  =  1, 2  of  the  components,  and 

c)  the  ratio  of  the  cross-term  magnitude  \Ax  |  to  the  mainlobe 
magnitude  of  the  signal  auto-terms  |  AMk  |,  k  =  1,2. 

It  follows  that  the  best  TFD  for  multicomponent  signals  analysis 
is  the  one  that  minimises  the  positive  quantities  a),  b)  and  c),  and 
maximises2  the  separation  D,  concurrently. 

Hence,  an  indicator  P  of  the  resolution  performance  of  a  given 
TFD  can  be  defined  as  [7]: 


p  =  \As\\Ax\ 
\Am\2D 


>  0 


(5) 


where  Am,  As  and  Ax  are  respectively  the  average  amplitudes 
of  the  mainlobes,  sidelobes  and  cross-terms  of  any  two  consecu¬ 
tive  components  of  the  multicomponent  signal,  with  D  being  their 
relative  separation. 

If  P  <  0,  then  there  is  no  separation  of  the  components,  while 
if  P  >  0,  P  provides  a  measure  of  the  resolution  performance, 
which  takes  into  account  separation  D  and  the  effect  of  cross¬ 
terms  (best  performance  is  achieved  by  minimising  P). 


3.  TIME-FREQUENCY  SIGNAL  ANALYSIS  OF 
CLOSELY  SPACED  COMPONENTS  USING  THE 
B-DISTRIBUTION 


A  key  to  understanding  time-frequency  relationships  is  through 
understanding  of  the  ambiguity  domain.  The  symmetrical  ambi¬ 
guity  function  (AF)  is  defined  as: 


/OO 

*(*+§)**(< -§)e-J'2"wdf  (8) 

•OO 

From  equations  (7)  and  (8)  we  can  see  that  the  WVD  and  the  AF 
are  related  by  a  two-dimensional  Fourier  transform  [2]: 

WVDz{tjyf  ^  AFz(o,t) 


/oo  poo 

/  AFz{v,T)e~^UT-vt)dvdT  (9) 

•OO  J  — oo 

It  was  shown  that  a  signal  mapped  by  the  AF  into  the  Doppler-lag 
domain  always  traverses  the  origin  of  that  plane,  while  the  cross¬ 
terms,  having  oscillating  amplitude  in  the  time-frequency  domain, 
are  located  away  from  the  origin  in  the  Doppler-lag  plane,  the  dis¬ 
tance  being  directly  proportional  to  the  time  and  frequency  dis¬ 
tance  of  the  signal  components  [1], 

This  property  of  the  AF  has  inspired  researchers  to  look  for 
two-dimensional  kernel  filters  g{v ,  t)  that  enhance  the  generalised 
ambiguity  function,  g{u,r)AFz{v,  r),  around  its  origin  and  sup¬ 
press  it  elsewhere. 

Using  equations  (6)  and  (9),  the  following  expression  can  also 
be  derived  [2]: 


Pz(t,f)=  /  /  g{u,r)AFz{u,r)e-^UT-vt)dodr 

J  —  oo  J  —  OO 

(10) 

Thus,  quadratic  TFDs  may  be  found  by  filtering  the  symmetrical 
ambiguity  function  with  g{v,r)  and  then  carrying  out  the  two- 
dimensional  Fourier  transform.  For  example,  for  the  Wigner-Ville 
distribution  with  the  ambiguity  domain  kernel  filter  equal  to  unity, 
no  filtering  is  applied  to  the  AF,  resulting  in  the  complete  preser¬ 
vation  of  the  cross-terms.  This  in  return  makes  the  interpretation 
of  the  WVD  of  multicomponent  signals  highly  difficult.  The  spec¬ 
trogram,  on  the  other  hand,  leads  to  a  quasi-total  elimination  of  the 
cross-terms  to  the  detriment  of  resolution. 


3.2.  New  Constraints  for  TFD  Design 


3.1.  Defining  TFDs  via  Ambiguity  Filtering 

Different  time-frequency  distributions  of  the  analytic  signal  z(t), 
associated  with  the  real  signal  s(t),  can  be  obtained  by  selecting 
different  kernel  functions  g{ v,  r)  in  the  general  expression  of  the 
quadratic  class3  [8J: 

_  (6) 

2The  maximum  value  is  D  =  1  which  is  obtained  when  V\  —  V2  =  0. 
3 All  three  integrals  have  limits  from  — oo  to  +oo.  Note:  this  formula 
differs  from  Cohen’s  formula  by  a  minus  sign  in  the  first  exponential. 


It  was  reported  in  [2]  that  for  a  time-frequency  analysis,  a  TFD  is 
expected  to  be  real,  to  satisfy  the  marginals  and  to  have  the  instan¬ 
taneous  frequency  as  its  first  moment  with  respect  to  frequency. 
These  strict  constraints  on  the  kernel  design  in  the  ambiguity  do¬ 
main  [9]  led  to  the  terminology  of  Cohen’s  class. 

However,  it  is  known  that  the  spectrogram  does  not  exhibit 
cross-terms,  and  does  not  satisfy  the  marginals.  Yet  the  spectro¬ 
gram  is  a  very  popular  tool  in  practical  applications,  suggesting 
that  the  time  and  the  frequency  marginal  constraints  may  not  be 
really  strictly  needed  in  practice.  What  may  be  more  important  is 
to  improve  the  energy  concentration  about  the  IF  for  monocompo¬ 
nent  signals  and  improve  the  resolution  for  multicomponent  sig¬ 
nals. 
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Following  this  logic,  we  may  therefore  conclude  that,  to  be  a 
suitable  tool  for  a  practical  time-frequency  analysis,  a  TFD  should 
verify  the  following  minimum  set  of  properties:4 

1.  Be  real, 

2.  Preserve  the  total  energy  of  the  signal: 

/oo  poo 

/  pz{t,f)dtd.{ 

■OO  j  —  CO 

3.  Preserve  the  regional  (component)  energy:  energy  in  the 
region  R  of  the  time-frequency  plane  bounded  in  time  by 
[ti,  tz]  and  frequency  [fi,f2]  should  be: 

rh  pi 

Er=  /  pz(t,f)dtdf 

J h  Jti 

4.  Reduce  the  cross-terms,  while  preserving  resolution  by  min¬ 
imising  measure  P  (defined  by  equation  (5)), 

5.  Reveal  the  IF  law  of  a  monocomponent  signal  by  its  peak. 

To  satisfy  these  constraints,  Barkat  and  Boashash  [4,  5]  recently 
proposed  a  kernel  for  a  quadratic  TFD,  known  as  the  B-distribution, 
defined  by: 

G«,r)  =  £»(,,  (ID 

The  kernel  filter  g(u,  r)  of  the  B-distribution  (BD)  was  chosen  in 
the  ambiguity  domain  to  be  a  two-dimensional  function  centred 
around  the  origin  with  sharp  cut-off  edges.  In  this  way,  the  ker¬ 
nel  would  allow  to  retain  as  much  auto-terms  as  possible  while 
filtering  out  as  much  cross-terms.  The  amounts  of  auto-terms  and 
cross-terms  kept  and  filtered  out  are  functions  of  the  volume  un¬ 
derneath  the  2-D  function  g(u,  r).  This  volume  can  be  changed  by 
varying  a  single  parameter  j3  (0  <  fd  <  1)  which  is  application 
dependent. 

In  addition,  a  modification  to  the  BD  kernel  (the  Modified  B- 
distribution)  by  authors  Hussain  and  Boashash  allows  an  efficient 
estimation  of  the  IF  laws  of  a  multicomponent  signal. 

The  kernel  of  the  Modified  B-distribution  is  defined  as  (10]: 

.  r(2a)  1 

G'(<’r)  =  22a-lr2(a)  C0Sh2a(t) 

where  T[-]  is  the  gamma  function  and  a  is  a  real  positive  number 
less  than  1. 

4.  PERFORMANCE  MEASURE  AND  COMPARISON  OF 
TIME-FREQUENCY  DISTRIBUTIONS 

In  this  section,  we  use  the  newly  defined  measure  criterion  to  com¬ 
pare  the  performance  of  the  WVD,  the  spectrogram,  the  Choi- 
Williams  distribution,  the  Bom-Jordan  distribution,  Zhao-Atlas- 
Marks  distribution,  the  B-distribution  and  the  Modified  B-  (MB) 
distribution  of  the  two-LFM-component  signal  defined  in  Section  1. 
For  each  time-frequency  distribution  we  take  a  slice  at  the  middle 
of  the  time  interval  and  measure  the  parameters  Am,  As,  Ax 
and  V.  These  parameters  are  then  used  to  calculate  the  frequency 

4Note  that  the  selection  of  a  complete  set  of  properties  would  be  appli¬ 
cation  dependent. 


separation  of  the  components  D ,  defined  by  equation  (4),  and  the 
performance  indicator  P,  defined  by  equation  (5). 

The  distributions  and  their  respective  measurements  parame¬ 
ters  are  recorded  in  Table  1,  while  the  slices  of  the  TFDs  at  the 
middle  of  the  time  interval  are  displayed  in  Figure  5. 


□ 

1 

_ Jfii 

U. _ 

(a)  BD  (solid)  and  Spectro-  (b)  BD  (solid)  and  WVD 
gram  (dashed)  (dashed) 


(c)  BD  (solid)  and  CWD,  (d)  BD  (solid)  and  BJD 
<7  =  2,  (dashed)  (dashed) 


(e)  BD  (solid)  and  ZAMD,  (f)  BD  (solid)  and  RD 
a  =  2,  (dashed)  (dashed) 


Figure  5:  Slices  taken  at  a  half  of  the  time  interval  of  TFDs 
of  two  closely-spaced  LFMs  with  frequency  fi  =  0.15  —  0.25-Hz 
and  /2  =  0.2  -  0.3 Hz.  BD=B-distribution,  WVD=Wigner- 
Ville  distribution,  CWD=Choi-Williams  distribution,  BJD=Born- 
Jordan  distribution,  ZAMD=Zhao-Atlas-Marks  Distribution,  and 
RD=Rihaczek  Distribution 


The  TFD  which  gives  the  smallest  positive  P  is  the  TFD  with 
the  best  performance  when  used  to  analyse  multicomponent  sig¬ 
nals.  In  our  case,  the  B-distribution  (j3  —  0.01)  yields  the  smallest 
value  for  P  (P  =  1.04  x  10-2)  and  hence  is  regarded  as  best. 
Similar  results  were  obtained  with  other  types  of  signals. 
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TFD 

Am 

^31 

Ax 

WMM I 

D 

Performance  Measure  P 

B-Distribution  (BD),  ft  =  0.01 

0.9890 

[  0.0796 

■tltklhl 

0.0197 

0.6337 

1.04  x  10“ 2 

Modified  B-Distribution  (MBD),  a  =  0.01 

0.9885 

0.0947 

0.0199 

0.6298 

1.32  x  10~2 

Bom- Jordan  Distribution  (BJD) 

0.9320 

0.1227 

0.3798 

0.0236 

0.5164 

1.04  x  10“ 1 

Choi- Williams  Distribution  (CWD),  a  =  2 

liTEH3« 

0.0211 

0.4415 

0.0258 

0.4766 

2.25  x  10-2 

Spectrogram  (Hanning  window) 

0.9119 

0.0493 

0.5527 

■IIIltWtM 

0.3557 

9.21  x  10T2 

Rihaczek  Distribution  (RD) 

0.9823 

0.2945 

0.3446 

0.0289 

0.4630 

2.71  x  10-1 

Wigner-Ville  Distribution  (WVD) 

0.9153 

0.4134 

1 

0.0140 

0.7558 

6.53  x  10“ 1 

Zhao-Atlas-Marks  Distribution  (ZAMD),  a  =  2 

IliTCIETM 

0.4822 

0.4796 

0.0238 

0.4331 

6.08  x  10“ 1 

Table  1:  Measurements  parameters  and  the  performance  indicator  P  ofTFDs  (slices  taken  at  the  half  of  the  signal  time  interval)  of  two 
closely-spaced  LFMs  with  frequency  fi  =  0.15  —  0.2bHz  and  fi  =  0.2  -  0.3  Hz 


4.1.  Optimisation  of  the  B-Distribution  Parameter  ft  Using  the 
Performance  Measure  P 

The  performance  measure  P  can  be  used  to  optimise  the  value  of 
the  smoothing  parameters  of  a  given  TFD.  One  approach  would 
be  to  take  consecutive  slices  of  the  TFD,  find  measure  P  for  each 
of  the  slices,  and  average  all  such  obtained  measures  for  a  given 
value  of  the  TFD  parameter  to  obtain  the  average  performance 
measure  Pa v.  Repeating  this  procedure  over  a  range  of  values 
of  the  smoothing  parameter,  it  is  possible,  by  identifying  the  one 
which  results  into  smallest  Pav,  to  obtain  the  optimal  value  of  the 
smoothing  parameter  of  the  TFD  considered. 

For  example,  using  the  measure  P,  we  can  optimise  the  pa¬ 
rameter  ft  of  the  B-distribution  for  the  signal  in  Section  1 .  Sim¬ 
ulations  have  shown  that  ft  =  0.01  gives  visually  most  appeal¬ 
ing  results  for  various  multicomponent  signals  [4],  However,  this 
value  can  be  refined  by  applying  the  above  described  optimisation 
procedure. 

By  calculating  Pav  for  ft  6  [0, 1]  with  the  increment  of  10-5 
and  for  the  distribution  slices  16: 1 12  (note  that  the  signal  length  is 
N  =  128)5  we  find  the  optimal  value  of  the  smoothing  parameter 
of  the  B-distribution  to  be  ftopt  =  9.9xl0-4  (Pav  =  9.1  x  10-3). 
Indeed,  a  reduction  in  Pav  value  of  approximately  2  x  10-3  is 
achieved  if  the  smoothing  parameter  of  the  B-distribution  is  opti¬ 
mised,  when  compared  to  Pav  =  1.1  x  10~2  of  the  B-distribution 
with  ft  =  0.01. 

5.  CONCLUSION 

This  paper  has  presented  two  key  results  which  we  believe  to  be 
fundamental  to  a  better  understanding  and  use  of  time-frequency 
signal  analysis  tools. 

The  first  key  result  is  a  definition  of  an  objective  criterion 
to  compare  the  resolution  performance  of  time-frequency  distri¬ 
butions  for  multicomponent  signals  analysis  using  a  quantitative 
measure  of  goodness  for  TFDs.  This  result  fills  an  obvious  need 
in  that  until  now  the  comparison  of  the  resolution  performance  of 
TFDs  was  primarily  based  on  a  visual  impression  of  the  plots  of 
TFDs. 

The  second  key  result  is  an  improvement  in  the  design  of  tools 
for  high  resolution  time-frequency  analysis  of  multicomponent  sig¬ 
nals.  By  removing  limitations  in  the  way  desirable  properties  of 

5We  avoid  calculations  of  the  measure  P  for  the  first  and  the  last  eighth 
of  the  TFD  slices  (i.e.  the  beginning  and  the  end  of  the  TFD  in  time) 
since  it  is  known  [2]  that  in  these  regions  of  the  time-frequency  plane  the 
components  resolution  is  always  significantly  degraded. 


quadratic  TFDs  were  previously  chosen,  a  new  set  of  design  crite¬ 
ria  has  been  defined.  It  was  found  that  such  defined  B-distribution 
outperforms  other  existing  distributions  in  terms  of  time-frequency 
resolution,  as  well  as  cross-terms  suppression,  when  used  to  repre¬ 
sent  signals  with  closely-spaced  components  in  the  time-frequency 
domain. 

The  combination  of  these  two  results  is  an  important  break¬ 
through  for  the  field  of  time-frequency  signal  analysis.  It  opens  the 
way  for  further  research  in  developing  high  resolution  DSP  tools 
for  non-stationary  (time- varying)  signals  by  removing  unnecessary 
limitations,  and  providing  a  measure  of  quality  ofTFDs. 
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ABSTRACT 

A  new  method  is  presented  to  study  systems  gov¬ 
erned  by  ordinary  linear  differential  equations  and  par¬ 
tial  differential  equations  whose  solutions  are  waves. 
We  show  that  one  can  obtain  a  differential  equation 
for  the  Wigner  distribution  of  the  solution  of  a  dynam¬ 
ical  equation  of  evolution.  As  an  example  we  derive  in 
a  new  way  the  equation  governing  the  Wigner  distrib¬ 
ution  for  the  Schrodinger  equation.  We  also  consider 
differential  equations  where  the  forcing  terms  are  ran¬ 
dom  processes. 

1.  INTRODUCTION 


paper  that  is  what  he  did  for  the  Schrodinger  equa¬ 
tion.  But  we  will  show  how  it  can  be  done  for  any 
wave  equation. 

Also,  we  will  show  how  the  methods  can  be  applied 
to  systems  with  random  input. 

Notation.  We  define  the  following  Hermitian  opera¬ 
tors  in  the  space  of  two  dimensional  functions  of  time 
and  frequency, 


4  1  9 

A  =  —  — —  oj 

2 j  dt 

(3) 

£  =  Tj-L  +  t  ; 

(4) 

Suppose  a  dynamical  variable,  x(t),  is  governed  by  a 
differential  equation,  for  example  by  a  linear  differential 
equation  with  constant  coefficients, 


dnx  dn  1x 

andF  +  an~1di^:T  + 


dx 

•••  +  oi  —  +a0x  = 
dt 


where  /(f)  is  the  driving  force.  Suppose  further  we 
want  to  study  the  time- frequency  properties  of  the  solu¬ 
tion  by  using  a  bilinear  distribution  such  as  the  Wigner 
distribution.  The  direct  way  would  be  to  solve  for  x(t) 
and  then  calculate  the  Wigner  distribution  of  x(t).  Our 
aim  is  to  obtain  the  differential  equation  for  the  Wigner 
distribution  of  the  solution  and  hence  bypass  the  ne¬ 
cessity  for  solving  Eq.  (1).  That  is  if  the  Wigner  dis¬ 
tribution  (WD)  is  defined  by 

W(t,u>)  =  ±  j  x*{t-\T)x{t  +  \T)  e~jTU>  dr 

we  want  to  obtain  an  equation  of  motion  for  W(t,uj ) 
directly. 

Similarly,  suppose  we  have  a  wave  equation  gov¬ 
erned  by  a  partial  differential  equation.  We  will  show 
that  one  can  obtain  an  equation  for  the  Wigner  distri¬ 
bution  of  the  solution.  Of  course,  in  Wigner’s  original 
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It  will  also  be  convenient  to  define  the  non  Hermitian 
operators, 


,  •„  ld  ■ 
a  =  sT;" 


B  =  jB=  irjj+jw 

2  dt  (5) 


Differentiation  of  functions  with  respect  to  time  will 
be  indicated  by 

m  =  jtm  ;  »<"’  =  £»(«>  <«> 


2.  RELATIONS  BETWEEN  THE  WD  OF  A 
SIGNAL  AND  THE  WD  OF  A  MODIFIED 
SIGNAL 

We  define  the  cross  Wigner  distribution  of  two  time 
functions,  x(t)  and  y(t),  by 

Wx,y(t,u)  =  ^~  f  x*(t  -\T)y{t  +  \T)e-i™dT 

27r  J  (7) 

Now  suppose  we  know  WXiy(t,  u>),  how  can  one  ob¬ 
tain  (for  example)  W±,y(t,  w)  and  other  such  quantities. 
These  type  of  relations  are  important  for  reasons  that 
will  become  apparent  in  section  3.  We  have  previously 
obtained  these  relations  [1]  and  we  just  list  them  here 
and  in  the  appendix  we  give  the  derivations.  In  partic¬ 
ular, 


W&,y  —  -AWx,y 


Wa 


x,y 


(8) 
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(9) 


-  BWXtV  —  ^  gi  +  Wx>v  (9) 

Another  important  relation  is  the  following.  Sup¬ 
pose  we  have  the  cross  Wigner  distribution  of  x  and 
y  and  wish  to  obtain  the  cross  Wigner  distribution  of 
f(t)x(t)  and  y{t)  where  /(f)  is  an  arbitrary  function. 
That  is,  we  want  to  express  Wfx,y(t,L o)  in  terms  of 
WXiy(t,w).  This  is  possible,  and  the  result  is 

Wfx,y(t,u>)=r(£)WXty  (10) 

WXJy(t,u)  =  f(P)Wx,y  (11) 

3.  LINEAR  DIFFERENTIAL  EQUATIONS 

In  our  previous  paper  we  considered  the  case  of  an  or¬ 
dinary  differential  equation  with  constants  coefficients. 
Here  we  extend  the  method  to  the  case  with  time  de¬ 
pendent  coefficients, 

dnx  dx 

an^~dF  'rai^>~dt  +  ao^x  =  /(*) 

(12) 

or  using  an  operator  notation 


X>(f)Dfc  x(t)  =  f(t) 


As  is  standard  we  define  the  differential  operator  by 
D  —  Jj- .  We  now  derive  associated  equations  WXiX(f,  u>) 
and  Wxj{t,  ui).  We  start  by  evaluating  the  cross  Wigner 
of  equation  (13)  with  respect  to  /(f) 

WlZh*kD>]*,f  =  Wf,f  (14) 


To  this  equation  we  apply  the  differential  operator  that 
acts  on  Wxj  in  Eq.  (17) 


J2aU£)Ak  Y.a^T)Bl  w*< 


=  J>£(£)Afc  WxJ 

-  k 


and  we  recognize  the  right  had  side  to  be  Wfj.  Hence 
equation  (17)  we  have 

^>£(£)Afc  X>(.F)B'  Wx,x  =  WfJ 

.  k  J  L  i  J  (20) 

This  is  the  equation  of  motion  for  the  Wigner  of  x{t). 

4.  POLYNOMIAL  COEFFICIENTS 

If  one  considers  the  ordinary  differential  equation 

f>«W^|  +  --+P.(()f +»(i)x  =  /(t)  ^ 

where  po(t), . . .  ,pn{t)  are  polynomials  in  the  t  variable, 
by  using  the  following  relations 

Wtx<x=£Wx,x  ;  Wx,tx  =  FWx,x  (22) 

and  the  usual  Eqs.  (8)  and  (9),  one  can  readily  get 
the  equation  for  the  Wigner  distribution  for  this  case. 
Specializing  the  general  result  (20),  that  is  considering 
Oj(<)  =  Pi(t),  for  i  =  0, . . .  ,  n,  we  have  that 

Y,pi^)Ak  Y.p‘^Bl  w*,*  =  wu 

.  k  J  L  i  J  (23) 


We  have 


5.  CONSTANT  COEFFICIENTS 


Tl  W[akD^]xJ  =  WfJ  (15) 

k 

and  applying  property  (10) 

’^2al(£)WDkxj  =  Wfj  (16) 

k 

and  using  Eq.  (8)  we  have 

^a*k(£)Ak  WxJ  =  WfJ  (17) 

k 

This  is  the  dynamical  equation  for  Wxj.  We  now  eval¬ 
uate  the  cross  Wigner  of  x(t)  with  respect  to  equation 
(13),  and  with  similar  considerations  we  obtain 

J2ak(f)Bk  Wx,x  =  WxJ  (18) 

-  k 


We  have  previously  considered  the  case  of  a  linear  dif¬ 
ferential  equation  with  constant  coefficients  [1]  and  hence 
we  just  summarize  the  results  here.  We  write  the  equa¬ 
tion  of  motion  for  x(t),  as 

[  anDn  +  an—\Dn  ^  ■  ■  ■  a\D  +  ao]  x(t)  =  f(t) 

(24) 

and  further  write  it  using  the  standard  polynomial  no¬ 
tation 

Pn(D)x(t)  =  /(f)  (25) 

where 

P n{P)  =  dnDn  +  <ln—lDn  1...dl  D  +  dQ 

(26) 

We  have  shown  that  the  governing  equation  for  the 
Wigner  distribution  is  given  by, 

PZ(A)Pn(B)WX)X  =  WfJ  (27) 

and  we  have  also  shown  that  solving  this  equation  di¬ 
rectly  has  significant  advantages  [2]. 
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6.  WAVE  EQUATIONS 

The  same  approach  that  we  adopted  in  section  3  for 
ordinary  differential  equations,  can  be  generalized  to 
linear  partial  differential  equations,  and  in  particular 
to  wave  equations.  Wigner  in  his  original  paper  ob¬ 
tained  the  equation  of  evolution  for  the  Wigner  dis¬ 
tribution  for  the  Schrodinger  equation.  It  is  our  aim 
to  develop  the  Wigner  distribution  approach  for  classi¬ 
cal  wave  equations,  such  as  electromagnetic  or  acoustic 
wave  equations.  Here  we  will  discuss  some  issues  in  re¬ 
gard  to  this  approach  and  a  fuller  development  will  be 
given  in  a  future  publication.  Suppose  we  have  a  partial 
differential  equation  for  u{x ,  t),  for  example,  wave  equa¬ 
tion.  As  Wigner  did  for  the  Schrodinger  wave  function 
we  can  defined  the  Wigner  distribution  of  position,  mo¬ 
mentum,  and  time  by 

W^(x,p,t)  =  J  ip*(x-  %Tx,t)ip(x+  ±Tx,t) 

x  e-j'T*p  d,Tx  (28) 

This  is  the  approach  that  Wigner  took  in  his  original 
paper.  In  this  approach  time  t  plays  a  passive  role. 
However  one  can  also  define  a  joint  distribution  of  the 
four  variables1 

Kip,ip{x,p,t,U>)  =  (2fl-)2  j ^  {t  ~  2Ti  x  ~~  2Tx) 

x  i>{t  +  ±r,  z  +  ±Tx)e-JTW-^*p  dr  drx  (29) 
It  follows  that 

J  K^t^(x,p,t,aj)d(j  =  W,j,t,p(x,p,t)  (30) 

Now,  a  fundamental  issue  is  whether  one  can  ob¬ 
tain  equations  of  evolution  for  W  and  K.  We  believe 
that  one  may  always  obtain  an  equation  for  K  but  not 
always  for  W\  In  another  paper  we  will  discuss  this 
issue  in  detail  but  here  we  illustrate  our  method  by  ex¬ 
amining  two  wave  equations,  the  Schrodinger  equation 
and  the  classical  wave  equation.  We  emphasize  that 
our  aim  in  devising  these  methods  is  to  study  classical 
equations  of  motions  by  way  of  the  Wigner  distribu¬ 
tion. 

6.1.  Operator  Relations 

It  is  possible  to  prove  the  following  results 

W a±  .  =  AXW^  ;  =  BXW 

ft*  r  dx  /Ol) 


where 


1We  us  K^,^(x,p,t,oj)  for  notational  clarity,  that  is  to  con¬ 
trast  with 


and  also  equations 

=  f*(Sx)WM  ;  =  g(3~x)W^^ 


where 

£x  _  2 j  dp  +  X  ’  2 j  dp  +  (34) 

and  where  /  =  f(x),  g  =  g{x)  are  two  arbitrary  func¬ 
tions. 

6.2.  The  Schrodinger  Equation 

The  Schrodinger  Equation  is  a  wave  equation  for  which 
an  equation  of  motion  for  both  W  and  K  can  be  ob¬ 
tained.  We  first  obtain  the  equation  for  W.  Schrodinger 
Equation  is2 


.dip(x,t)  _  1  d2'ip(x,t) 

1  dt  2m  dx 2 


+  V(x)ip(x,t) 


To  obtain  the  equation  for  the  Wigner  distribution 
WM,  we  evaluate  the  cross  Wigner  of  Eq.  (35)  with 
respect  to  ip(x,  t ) 

Wj =  +  WVM  (36) 

Extracting  the  coefficients  we  have 

+  WV0*  <37) 

Applying  properties  (31)  and  (33)  we  obtain 

-jw#,*  =  Aiw *.* + nyw*.,. 

Now  we  evaluate  the  cross  Wigner  of  ip(x,  t)  with  re¬ 
spect  to  Eq.  (35) 

W,je±=W  (39) 

and  with  similar  operations  we  get 

jW+&  =  -£hb*w*’+  +  v^w*’+  (40) 

We  then  subtract  Eq.  (40)  from  Eq.  (38),  getting 
-  i-  [A2  -  Bl)  Wm  +  [V*(^)  -  V{TX)\ 

2We  take  ft  =  1. 
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Using  the  fact  that 

dWi,^ 

dt 


W^,  +  W, 


Al-Bl  =  -23Pl 


we  obtain 


_jdw±±=  p_mv±:}t  ,_v(7r  ow 

3  dt  Jm  dx  ^x'  v(^*)lwVg4 

One  can  show  that 

V*(£x)W^(x,p,t)  = 

~  f  f  V\x  +  u/2)e-M-riuW^{x,p')dudp' 
ZnJJ  (45) 

V{Fx)W^^(x,p,t)  = 

~  [fv(x  +  u/2 )e^p'-^uWM(x,p')  dudp' 
ZnJJ  (46) 

Now  consider  real  potentials,  then 

\V{£X)  -  V(TX)\  WM  = 

~~  [  V(x  +  u/2)sin[(p'  -p)u\W^^{x,p')dudp' 

*  J  (47) 

and  hence  we  have  that 

dWM  .  p  dWjrf  _ 

dt  3  m  dx 

-  f  V(z  +  u/2)sin[(p'  -  p)u)W^^{x,p')  dudp' 

* 3  (48) 

Which  is  the  well  known  result  for  the  Wigner  distribu¬ 
tion  and  was  derived  by  Wigner  using  different  methods 
[6], 

We  now  derive  an  equation  for  K.  First  we  point 
out  that  Eqs.  (10)  and  (11)  still  hold,  and  that  the 
following  relationships  can  be  easily  proved  with  the 
same  technique  of  the  ordinary  equation  case 

K  =  AxK$ty  =  AtK^rf  (49) 

7^  0^  =  BXK^^  K^^  =  BtK^^  (50) 


where 


A-  =  2Yx~ir  Al  =  28i-'“  <51> 
B-  =  \lk+iP  B,-itt+ju,  (52) 


and  Kip^  =  K^^(x,p,t,w)  is  the  four  dimensional 
Wigner  distribution.  Using  the  same  approach  of  sec¬ 
tion  6.2,  we  evaluate  the  cross  Wigner  of  the  two  sides 
of  Schrodinger’s  equation  with  respect  to  ip,  and  ap¬ 
plying  the  new  operator  relations,  we  have 


jAiK^ ,'ip 


-A2xK^  +  V*(£x)Km> 


Now  we  do  the  other  way,  taking  the  cross  Wigner  of 
ip  with  respect  to  Schrodinger’s  equation,  obtaining 

„  BxK^tXjj  +  V{£F 

2m  (54) 

We  subtract  Eq.  (54)  from  Eq.  (53),  and  we  have 
-j[At  +  Bt\  = 

-  [4  -  B2X\  Km  +  [V*(£x)  -  V{TX)\  Km 

Zm  (55) 

Using  the  fact  that  At+Bt  =  §-t  and  A2x-Bl  =  -2 jpJfe 
we  obtain 

-  ^ 0J£r + ^  ~ v^i  K*^ 

We  emphasize  that  while  this  equation  looks  the  same 
as  Eq.  (44)  it  is  not  because  K  is  a  four  dimensional 
density.  One  can  obtain  Eq.  (44)  by  integrating  out  ui. 
This  is  due  to  the  fact  that  all  the  operators  that  act 
on  do  not  involve  u>  and  hence  one  can  obtain  the 
equation  of  motion  for  W  Eq.  (44)  from  the  equation 
of  motion  of  K. 

7.  CLASSICAL  WAVE  EQUATION 

We  want  to  apply  the  same  method  developed  for  the 
case  of  ordinary  differential  equations  to  the  classic 
wave  equation 


1  d 2 

^2  Qj2  uixit)  = 


One  can  prove  with  the  same  considerations  of  section 
3  that  the  equation  for  the  four  dimensional  Wigner 
distribution  is 

Ax  ~  ~2 At  Bx  —  ~2-7?t  KiP'i/,(x,p,t,uj)  — 
c  c 

w)  (58) 

However  we  believe  that  it  is  impossible  to  obtain 
an  equation  for  the  three  dimensional  Wigner  distrib¬ 
ution  in  this  case.  One  can  convince  oneself  of  this  by 
attempting  to  do  so  directly  by  the  same  methods  that 
have  been  applied  to  the  Schrodinger  equation.  Alter¬ 
natively  one  attempt  to  get  it  is  by  integrating  out  oj 
from  Eq.  (58).  But  it  is  not  possible  to  integrate  out 
w  to  obtain  an  equation  for  W(x,p,t).  This  is  due  to 
the  fact  that  the  operators  At  and  Bt  contain  uj. 
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8.  RANDOM  DRIVING  FORCE  AND 
RANDOM  COEFFICIENTS 

There  are  two  important  cases  that  arise  in  many  ar¬ 
eas  for  physics,  chemistry  and  engineering.  The  first 
case  is  where  only  the  random  force  is  stochastic.  That 
is  the  most  important  case.  The  other  case  is  where 
the  coefficients  are  randomly  given.  In  this  paper,  for 
lack  of  space,  we  only  consider  the  first  case  with  de¬ 
terministic  coefficients,  that  is  Eq.  (12)  with  a  random 
driving  force.  To  illustrate  we  consider  here  the  case 
of  an  harmonic  oscillator  with  a  Gaussian  process  N(t) 
as  input 

x(t)  +  2  n±(t)  +  ujQx{t)  =  N(t)  (59) 

The  governing  equation  for  the  Wigner  distribution  of 
x(t)  is,  from  (20) 

f  A 2  +  2 pA  +  1  T  B2  +  2 iiB  +  Wq  ]  WXtX  =  Wn,n 
L  (60) 

Here  obviously  both  x(t)  and  WXiX(t,ui)  are  stochastic 
processes.  The  important  fact  of  having  a  stochastic 
equation  for  the  Wigner  distribution,  is  that  one  can 
get  deterministic  equations  for  the  mean  values,  and 
for  any  moment  in  general.  For  example,  considering 
that  a  linear  (differential)  operator  acts  on  Wx<x  in  Eq. 
(60),  one  can  evaluate  the  ensemble  average  of  the  two 
sides  of  the  same  equation 

[  A2  +  2 [iA  +  Wg  ]  [  B2  +  2fiB  A  Wg  ]  £  [W^x] 

=  £  [Wat, n]  (61) 

More  interesting  is  the  case  of  an  harmonic  oscilla¬ 
tor  with  time-varying  coefficients.  For  example,  if  we 
consider  the  underdamped  case  with  fi  -C  uq,  then  it 
is  known  that  the  harmonic  oscillator  behaves  like  a 
bandpass  filter  with  central  frequency 


u>c  =  \Ju>Q  -  fj,2  «  Wo  (62) 

By  letting  w0  =  w0(f)  we  can  hence  build  a  bandpass 
filter  with  time- varying  central  frequency.  If  again  we 
set  a  Gaussian  process  as  input  we  have 

x(t)  +  2  p,x(t)  +  Wg  (t)x(t)  =  N(t)  (63) 

This  equation  is  of  interest  in  many  areas,  since  many 
systems  show  a  bandpass  behavior  which  is  actually 
time  or  space  varying  under  closer  analysis.  The  differ¬ 
ential  equation  for  the  Wigner  distribution  is,  in  this 
case, 


[  A2  +  2MA  +  w|}(£)  ][B2  +  2 nB  +  wg(.F)  ]  Wx,x 

=  WNtN  (64) 

The  interesting  thing  is  that  we  have  a  “stationary” 
process  N(t)  as  input  that  is  processed  by  a  time- 
varying  system  that  generates  a  “nonstationary”  process 
x(t)  as  output.  Now,  an  important  aspect  of  such  a 
process  is  the  random  instantaneous  frequencies  in  x(t), 
which  is  due  to  the  time-varying  filtering.  It  is  hence 
important  to  derive  the  differential  equation  (64)  for 
the  Wigner  Wx,x(t,uj)  of  x(t),  because  from  it  we  may 
be  able  to  derive  equations  for  moments  of  Wx,x(t,uj) 
which  are  physically  significant.  For  example,  taking 
the  ensemble  average  of  both  sides  of  Eq.  (64),  we 
have 

[A2  +  2 nA  +  u%(£)\  [B2  +  2 >iB  +  wg(^)]  £  [Wx,x\ 

=  £[Wn,n]  (65) 

This  approach  may  be  fruitful  for  studying  how  the 
moments  of  the  input  process  evolve  in  time. 

9.  CONCLUSION 

We  have  derived  a  method  to  study  systems  governed 
by  linear  ordinary  and  partial  differential  equations. 
The  method  allows  us  to  write  an  associate  equation 
for  the  Wigner  distribution  of  the  solution.  In  this  pa¬ 
per  we  have  shown  how  to  transform  ordinary  equations 
with  time-varying  coefficients.  We  have  also  given  ex¬ 
amples  on  how  the  method  can  be  used  to  study  wave 
equations  and  systems  with  random  inputs. 
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ABSTRACT 

In  this  paper  we  propose  two  methods  to  detect  buried 
underground  objects.  Both  methods  are  based  on  time- 
frequency  analysis.  The  first  approach  uses  the  instan¬ 
taneous  frequency  of  the  return  Ground  Penetrating 
Radar  (GPR)  signals  and  the  second  approach  uses  a 
time-frequency  distribution,  such  as  the  Wigner-Ville 
or  the  spectrogram,  of  the  return  signals.  Real  data 
were  used  in  the  examples  to  validate  the  proposed  al¬ 
gorithms. 


techniques  in  the  analysis  of  the  return  signal  we  may 
be  able  to  detect  a  target  more  accurately.  The  signal 
feature  selected  for  detection  is  the  instantaneous  power 
of  the  return  signal  and/or  its  energy  evaluated  using 
a  time-frequency  distribution,  namely,  the  Wigner-Ville 
distribution  (WVD). 

2.  PRELIMINARIES 

The  WVD  is  defined  as 


1.  INTRODUCTION 

Landmines  are  causing  enormous  humanitarian  and  e- 
conomic  problems  in  many  countries  around  the  world. 
Experts  estimate  that  up  to  110  millions  of  landmines 
are  still  to  be  cleared  and  that  more  than  500  civilians 
are  killed  or  maimed  every  week  by  landmines.  Most  of 
the  victims  are  innocent  children  [1]. 

Today,  the  most  widespread  technique  for  landmine 
detection  is  metal  detection  [3].  However,  this  technique 
becomes  almost  useless  when  there  is  a  large  amount  of 
metal  debris  in  the  field  to  be  cleared.  In  this  case, 
manual  probing  is  required  and  the  demining  process 
becomes  very  laborious  and  slow.  In  addition,  modern 
war  technologies  are  producing  mines  that  contain  no 
or  a  very  small  amount  of  metal. 

To  avoid  the  above  mentioned  problems  Ground  Pen¬ 
etrating  Radar  (GPR)  has  been  applied.  The  use  of 
GPR  systems  stems  from  their  ability  to  detect  buried 
objects  based  on  a  change  in  the  dielectric  permitivi- 
ty  of  the  ground  rather  than  the  metal  content  of  the 
target  [5]. 

Extensive  data  analysis  shows  that  the  GPR  return 
signal  is  non-stationary.  Thus,  by  using  time-frequency 


W(t,f)=  z(t  +  l).z*(t-^)e-i2*frdT 

J -OO  ^  * 


(1) 


where  z(t)  is  the  analytic  signal  associated  with  the  real 
signal  under  consideration,  s(t)  [4], 

We  can  show  that  integrating  the  WVD, 
over  all  frequencies  would  result  in  the  instantaneous 
signal  power,  |^(<)|2;  while  its  integration  over  time 
would  result  in  the  energy  spectrum  |Z(/)|2.  We  can 
also  obtain  the  total  signal  energy  Ez  by  integrating  the 
WVD  over  time  and  frequency  as  follows 


E- 


/OO  /»C X 

-oo  J  — c 


W(t,  f)dfdt 


(2) 


An  important  concept  closely  related  to  the  time- 
frequency  analysis  is  the  instantaneous  frequency  (IF) 
defined  as 


fi(t)  = 


1  d(j){t) 
27 r  dt 


(3) 


where  <f>(t )  is  the  phase  of  the  signal  under  considera¬ 
tion. 

The  above  definition  and  results  will  play  a  major 
role  in  the  detection  procedure  outlined  below. 
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3.  GPR  RETURN  SIGNALS 


S06UWI 


In  this  section  we  give  a  brief  description  of  the  exper¬ 
iment  and  outline  the  need  for  a  further  processing  of 
the  GPR  data. 

The  experiment  was  conducted  at  the  Defence  Sci¬ 
ence  &  Technology  Organisation  (DSTO)  Australia  us¬ 
ing  a  GPR  equipped  with  a  bistatic  bow-tie  antenna. 
The  antenna  is  moved,  in  one  direction  over  a  distance 
d,  above  the  ground  surface  at  a  constant  velocity  and 
constant  height.  At  regular  time  instants,  the  GPR  sys¬ 
tem  radiates  short  duration  pulses  of  electromagnetic 
energy  into  the  ground  and  collects  backscattered  sig¬ 
nals.  A  target  is  declared  present  if  the  GPR  detects  a 
local  change  (or  discontinuity)  in  the  soil  dielectric. 

The  collection  of  the  regularly  spaced  return  signals, 
referred  as  GPR  traces,  over  the  distance  d  is  called  the 
radargram.  In  Figure  1,  we  display  a  typical  radargram 
obtained  from  an  experiment  discussed  in  detail  in  the 
next  sections. 

In  the  experiment  we  buried  three  different  targets; 
however,  the  radargram  reveals  the  potential  existence 
of  two  targets  only  and  misses  the  third  target. 


R#  00611 Mt 


DteUnc*  [m] 


Figure  1:  Radargram  of  three  landmines  buried  in  a  dry 
sandpit  (file  80611041). 

The  time  domain  plots  of  two  different  GPR  traces, 
one  taken  at  a  position  where  a  target  exists  the  other 
taken  at  a  position  where  there  is  no  target,  are  much 
alike  and  do  not  give  specific  answer  about  the  presence 
of  a  target  (see  Figure  2). 

In  order  to  improve  the  detection  performance  other 
techniques  have  been  suggested  [3,  6].  Here,  we  propose 
two  different  techniques:  an  IF  based  detection  and  an 
energy  based  detection. 

4.  THE  IF  BASED  TECHNIQUE 

The  ability  of  the  time-frequency  distribution  to  dis¬ 
play  the  signal’s  spectral  components  makes  it  a  very 
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Figure  2:  Two  GPR  return  signals:  one  with  target,  the 
other  without  target. 

powerful  tool  in  the  localisation  and  estimation  of  the 
instantaneous  frequency  of  the  signal. 

For  the  estimation  of  the  IF  of  the  GPR  signals  we 
use  the  peak  of  the  WVD.  The  change  in  the  IF  from 
one  trace  to  another  cannot  be  seen  directly  from  the 
time-frequency  representations,  of  the  traces,  nor  from 
the  plots  of  the  IF  estimates  (not  shown  here).  However, 
by  defining  a  measure  criterion  as  follows 

/OO 

[fHt)-fiack(t))2dt  k  =  l,...,Ns,  (4) 

•OO 

we  are  able  to  detect  the  presence  of  the  target.  In 
Equation  (4)  /*(£)  refers  to  the  IF  of  the  kth  GPR  trace 
under  analysis,  /focfc(t)  refers  to  the  IF  of  a  background 
only  (no  target  present)  and  Ns  is  the  total  number  of 
GPR  traces  in  the  radargram. 

We  should  note  that  prior  to  estimating  the  IF,  we 
subtract  the  mean  of  the  signal  and  form  its  analytic 
version  in  order  to  avoid  aliasing  in  the  time-frequency 
distribution  [2].  Thus,  the  IF  based  detection  algorithm 
can  be  stated  as  given  by  Table  1  below. 

As  an  example,  consider  the  detection  of  two  sur¬ 
rogate  anti-personnel  landmines,  referred  as  ST-AP(2) 
and  ST-AP(3),  modeled  after  the  PMN  and  PMN2  re¬ 
spectively  and  buried  in  a  dry  sandpit.  These  two  tar¬ 
gets  have  no  metal  in  their  casings  and  have  dimensions 
of  11.8cmx5.0cm  (px/i)  and  11.5cmx5.3cm  respective¬ 
ly.  The  first  target  is  located  at  71.9  cm  from  the  ori¬ 
gin  while  the  second  target  is  located  at  156.1  cm  from 
the  origin.  The  radargram  of  this  experiment  is  shown 
in  Figure  3.  Signals  taken  from  the  radargram  at  two 
different  positions  (one  where  a  target  is  known  to  be 
present  the  other  where  there  is  no  target)  are  displayed 
in  Figure  4.  It  is  clear  from  the  two  plots  that  neither 
the  radargram  nor  the  time  domain  can  discriminate  be¬ 
tween  the  target  and  the  background  (target-free).  By 
using  the  proposed  IF  based  algorithm,  we  obtain  the 
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For  every  GPR  trace  of  the  radargram 

Sk(t)  1  <  k  <  Ns, 

1.  Remove  the  mean  of  the  signal 

Xk(t)  =  sk(t)  -  mean(sfc(t)) 

2.  Compute  the  analytic  version  of  Xk(t ) 

Zk(t)  =  xk{t)  +j'H[xk(t)\ 
where  %[.]  is  the  Hilbert  transform  operation. 

3.  Compute  the  time-frequency  distribution  Wz(t,  f) 
of  the  signal  Zk{t).  The  IF  estimates  are  found  as  the 
maximum  of  Wz(t,f)  for  each  time  instant  t,  i.e., 

fi(t)  =  arg[max  Wz(t,  /)] 

where  I  =  {/  :  0  <  |/|  <  1/2T}  and  T  being  the 
sampling  period  of  the  signal. 

4.  Compute  the  measure  criterion  as  given  by  (4). 

Table  1:  IF  based  detection  algorithm 


results  displayed  in  Figure  5.  We  can  clearly  observe 
the  presence  of  the  targets  at  their  respective  positions. 


Figure  3:  Radargram  of  two  surrogate  anti-personnel 
landmines. 


Note  that  the  phases  of  the  GPR  signals  computed 
at  all  positions  d  can  also  be  used  to  locate  the  tar¬ 
get  over  the  analysed  distance.  However,  for  some  sit¬ 
uations  the  algorithm  does  not  detect  all  the  targets 
present  as  is  the  case  for  the  above  example.  Figure  6 
illustrates  the  point. 


x10* 


Figure  4:  Two  GPR  return  signals:  one  with  target,  the 
other  without  target. 


Figure  5:  IF  based  algorithm  result  for  the  detection  of 
two  surrogate  anti-personnel  landmines. 


Figure  6:  Phase  based  algorithm  result  for  the  detection 
of  two  surrogate  anti-personnel  landmines. 


5.  ENERGY  BASED  DETECTION 

In  this  section,  we  use  another  time-frequency  approach 
to  detect  the  presence  of  a  buried  target  in  the  soil.  The 
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method  is  based  on  the  discriminator  D  defined  as 

D  =  J  J  W(x-back)(t,f)dtdf  (5) 

where  Wz(t,f)  represents  the  WVD  of  a  given  signal 
z.  In  the  above  equation,  W(x-i,ack)(t,  /)  represents  the 
WVD  of  the  difference  of  two  signals:  x,  the  actual 
return  signal  under  analysis  and  back ,  a  reference  back¬ 
ground  signal  (no  target).  Here  again,  the  aim  is  to 
decide  whether  the  actual  signal  x  represents  a  GPR 
trace  where  a  target  exists  or  a  GPR  trace  where  there 
is  no  target.  We  should  note  that  other  time-frequency 
distributions,  such  as  the  spectrogram,  can  also  be  used 
in  place  of  the  WVD. 

The  algorithm  has  been  applied  to  a  large  number 
of  radargrams  with  different  types  of  targets  and  dif¬ 
ferent  types  of  soil.  The  results  show  that  the  method 
can  effectively  reveal  the  presence  of  a  buried  target. 
As  an  illustration,  let  us  consider  the  detection  of  three 
surrogate  landmines  buried  in  sandpit  at  depth  0.5  cm 
and  at  distances  0.565m,  1.537m  and  2.413m  respective¬ 
ly.  Two  of  these  targets  are  made  of  solid  stainless  steel 
cylinders  and  have  dimensions  of  10cm  x  5cm  (ox/;,)  and 
5cmx  5cm;  whereas,  the  third  target  is  a  PVC  cylin¬ 
der  and  has  dimensions  10cm  x  5cm.  The  radargram  of 
this  experiment  is  displayed  in  Figure  1  and  the  time 
representation  of  two  traces  (with  and  without  target) 
are  displayed  in  Figure  2.  As  stated  earlier  it  is  very 
difficult  to  detect  all  the  targets  from  these  two  plots. 
However,  when  we  apply  the  algorithm  based  on  the 
discriminator  D  we  obtain  the  results  displayed  in  Fig¬ 
ure  7. 


Figure  7:  Discriminant  based  algorithm  result  for  the 
detection  of  three  buried  targets  in  sandpit. 

Note  that  the  energy  return  of  the  bigger  stainless 
steel  cylinder  is  much  higher  than  the  energy  return  of 
the  smaller  steel  cylinder  or  the  PVC  cylinder. 

Before  concluding,  we  should  emphasise  that  the  t- 
wo  techniques  proposed  here  are  able  to  detect  buried 


objects  in  the  ground  including  unwanted  targets,  such 
as  scrap  metal.  In  order  to  detect  only  wanted  tar¬ 
gets,  a  thresholding  and  a  classification  procedure  has 
to  follow  the  detection  procedure.  The  classification  is 
beyond  the  scope  of  the  present  paper  and  will  be  ad¬ 
dressed  elsewhere. 

6.  CONCLUSION 

In  this  paper,  we  proposed  two  different  methods  to 
detect  buried  objects  underground.  Both  methods  are 
based  on  time-frequency  techniques.  The  first  method 
uses  the  instantaneous  frequency  of  the  return  GPR  sig¬ 
nal;  whereas,  the  second  approach  uses  a  discriminant 
measure  evaluated  using  the  time-frequency  distribu¬ 
tion  of  the  signal.  Examples,  using  real  data,  show  that 
both  algorithms  are  very  powerful  in  detecting  buried 
landmines. 
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Abstract  -The  classification  of  objects  or  quantities  in 
all  fields  of  science  depends  on  the  quality  of  the 
features  used  for  classifying  them.  This  includes,  for 
example,  classification  of  phenomenon  described  by 
nonstationary  processes  such  as  electrocardiograms, 
seismic  geophysics  signals,  submarine  transient 
acoustic  signals,  and  speech  signals  for  recognition. 
This  paper  presents  a  new  matrix  decomposition  that 
is  used  to  obtain  a  set  of  principal  features  from  time- 
frequency  representations  for  classifying 
nonstationary  time  series  processes.  This  new  matrix 
decomposition  is  based  a  transformation  of  the 
orthonormal  basis  from  singular  value  decomposition 
(SVD).  The  new  basis  set  yields  extrema  of  the  first 
moment  for  each  vector  in  the  new  basis  set.  These 
basis  sets  for  time  and  frequency  can  then  be  used  to 
construct  features  relating  to  the  location  and  spread 
of  each  energy  density  highlight  in  the  time- 
frequency  plane.  This  new  matrix  decomposition  is 
presented  in  this  paper  along  with  a  simple  example 
to  illustrate  its  application. 

I.  Introduction 

The  development  of  modem  techniques  to  process 
nonstationary  signals  has,  and  continues  to  be,  the 
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focus  of  much  research  where  a  goal  is  the  description 
of  the  distribution  of  signals  energy  as  a  joint  function 
of  time  and  frequency.  To  this  end,  some  notable 
contribution  have  been  made  by  Wigner  (1-2],  Cohen 
[3-5],  Choi  and  Williams  [6],  Zhao  Atlas  and  Marks 
[7],  and  Lougnlin,  Pitton  and  Atlas  (8],  Groutage  [9], 
and  Loughlin  [10]  to  name  but  a  few.  It  is  the 
representation  of  nonstationary  processes  by  time- 
frequency  distributions  that  make  possible  the  means 
for  classifying  such  processes.  Certainly,  from  a 
classification  standpoint,  a  desirable  time-frequency 
representation  is  one  that  has  the  correct  description  of 
energy  density.  For  if  this  is  true,  it  is  unique. 
Unfortunately,  this  is  not  always  the  case.  In  fact,  it 
most  likely  is  never  the  case.  For  example,  the  Wigner 
distribution  always  satisfies  the  frequency  and  time 
marginals,  but  is  not  always  manifestly  positive  and 
may  contain  cross  products  that  do  not  relate  to  the 
physical  quantities.  On  the  other  hand,  the  spectrogram 
never  satisfies  the  time  and  frequency  marginals,  but  is 
always  positive.  The  spectrogram  is  a  window  based 
time-frequency  representation  and  therefore,  is  not 
unique.  For  a  time-frequency  distribution  to  be 
interpreted  as  a  joint  time-frequency  energy  density,  it 
must  satisfy  the  two  fundamental  properties  of 
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nonnegativity  and  the  correct  time  and  frequency 
marginals: 


Q(t,f)>  0 

(1) 

oo 

\<2(t,f)dt  =  \s(f)\2 

— oo 

(2a) 

oo 

\Q(t,f)df  =  \s(tf 

(2b) 

—<x> 

Where  S(f )  =  J  s(t)e~j2nf‘dt  is  the  Fourier 

— oo 

transform  of  the  signal.  The  joint  time-frequency 
energy  density  is  a  physical  quantity  that  can  be 
used  to  describe  the  behavior  of  the  process.  It 
specifies  where,  jointly  in  time  and  frequency,  the 
energy  is  concentrated.  The  time  marginal  by  itself 
describes  the  instantaneous  energy  and  specifies 
where  in  time  the  energy  is  located.  The  frequency 
marginal  in  contrast  specifies  where  in  frequency 
the  energy  is  concentrated.  However,  the  joint 
time-frequency  description  of  energy  concentration 
provides  a  means  for  describing  the  main  attributes 
of  the  nonstationary  process  with  a  relatively  few 
descriptors.  It  is  the  essence  of  these  descriptors 
that  can  be  used  for  classifying  the  underlying 
process. 

Marino  vie  and  Eichmann  [11]  and  [12]  looked  at  a 
feature  extraction  technique  based  on  the  singular 
value  decomposition  (SVD)  of  the  Wigner 
distribution.  Their  technique  used  only  the  singular 
values  to  determine  the  features.  In  contrast, 
Groutage  and  Bennink  [13]  looked  at  a  new  method 
that  uses  not  only  the  singular  values,  but  also  the 
singular  vectors.  The  reason  being,  the  singular 
values  are  pure  numbers  and  do  not  contain 
significant  information  about  the  underlying 
process,  whereas,  the  singular  vectors  contain  the 
bulk  of  the  information.  Since  the  SVD  singular 
vectors  are  orthonormal,  the  vectors  whose 
elements  are  composed  of  the  squared-elements  of 
the  SVD  vectors  are  discrete  density  functions. 
Moments  generated  from  these  density  functions 
are  the  principal  features  of  the  non-stationary  time 
series  process.  When  the  energy  density  is  not 
uniformly  concentrated  at  various  locations  in  the 
time  frequency  plane,  this  technique  works 
relatively  well.  However,  when  the  energy  is 
uniformly  concentrated  at  more  than  one  location. 


the  technique  breaks  down.  It  was  for  this  reason  that  a 
new  matrix  decomposition  was  looked  at. 

II.  Principal  Features  From  Singular 
Value  Decomposition  (SVD) 

A  matrix  A  can  be  decomposed  into  a  sum  over  a  set  of 

basis  matrices  A*  each  multiplied  by  a  weight  (7  ( : 

A  =  to, A,  (3) 

1=1 

Although  this  can  be  accomplished  in  a  variety  of  ways, 
one  convenient  approach  is  given  by  the  singular  value 
decomposition  Lawson  [14].  The  SVD  yields  as  the 
weights  a  set  of  positive  real  numbers,  the  singular 
values,  such  that  Oi  >  a2  >  . . .  >  Or  >  0,  and 
associated  singular  vectors  u,  and  v,  such  that 

A;  =u(vf  (4) 

where  R  is  the  rank  of  A  and  H  is  the  Hermitian 
transpose.  All  of  the  information  contained  in  A  is 
certainly  also  contained  in  the  complete  set  of  basis 
matrices  A,  and  weights  CT .  While  the  complete  form 

of  (3)  does  not  provide  a  reduction  to  a  small  set  of 
descriptors,  the  hope  is  that  the  decomposition  method 
leads  to  an  easier  way  to  extract  the  important 
information.  This  is  certainly  the  case  if  the  weights 
alone  can  be  used  as  the  derived  features. 

If  the  basis  matrices  A,  were  a  fixed  set,  then  the  index 
on  the  singular  values  <7,  would  associate  directly  with 
time-frequency  content,  since  the  time-frequency 
distribution  would  be  known  for  each  A,.  However,  the 
basis  set  is  determined  as  an  integral  part  of  the  SVD. 
Thus,  in  order  to  assign  time-frequency  content  to  each 

O  i  it  is  necessary  to  extract  this  information  from  the 
A which  together  with  the  singular  values  will  provide 
the  desired  features.  A  naturally  first  step  is  to  use  the 
joint  time-frequency  moments  in  (3)  to  characterize  the 
A,.  Under  the  assumption  that  the  SVD  process  has 
separated  the  energy  highlights,  only  a  few  such 
moments  should  be  required.  An  added  benefit  of  the  A, 
from  (3)  is  that  the  time  and  frequency  aspects  are 
independent,  so  only  the  temporal  and  spectral 
moments  need  to  be  considered,  as  opposed  to  joint 
moments. 

III.  Principal  Features  From  Transformed  Singular 
Value  Decomposition  (TSVD) 

When  the  above  assumption  is  not  met,  i.e.,  SVD 
process  cannot  separate  the  energy  highlights,  the 
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resulting  features  will  associate  with  a  linear 
combinations  of  a  number  of  the  singular  vectors  in 
time  and  frequency.  These  resulting  features  would 
not,  for  the  most  part,  be  useful  for  classification 
purposes.  However,  by  studying  a  few  simple 
examples  that  pointed  out  the  dilemma  of  the 
energy  density  highlights  being  associated  by  linear 
combinations  of  respective  singular  time  or 
frequency  vectors,  the  idea  was  posed  to  construct  a 
new  basis  set.  Basically,  the  idea  is  to  rotate  the 
original  SVD  basis  vectors  to  a  new  orientation  in 
the  span  of  their  vector  space  such  as  to  minimize 
the  number  of  vectors  required  in  linear 
combinations  that  associate  with  energy  density 
highlights.  Mathematically,  the  problem  is  to  find 
an  orthonormal  transformation,  C,  such  that 

y<=Ecuu/  (5) 

j 

Where  y,  are  the  rotated  basis  set  of  vectors  that 
associate  with  the  u, .  Also  the  requirement  is 
imposed  that  the  resulting  y,  are  orthonormal,  i.e., 
the  inner  product  is  such  that(y ,  ,y, )  =  8I(.  This 
implies  that 

CCr=CrC  =  I  (6) 

In  equation  (5)  both  the  y,  vectors  and  the  ctJ 
coefficients  are  unknowns.  When  the  A  matrix  of 
equation  (3)  is  (MxN),  then  SVD  yields  an  (MxM) 
U  matrix  whose  columns  are  the  vectors  u, ,  and  an 
(NxN)  V  matrix  whose  columns  are  the  vectors  v, , 
and  an  (MxN)  S  matrix  whose  diagonal  elements 
are  the  singular  values.  The  solution  to  the  problem 
posed  by  equation  (5)  is  to  find  the  cu  coefficients 
in  some  optimum  fashion.  This  was  accomplished 
as  follows:  first,  the  means  of  the  y,  vectors  are 
formulated  in  terms  of  the  cu  coefficients.  The 
means  for  the  y,  are 

M 

hnyfr)  m 

(m)i  =  ^ - = X  myf  (?) 

2>?(r)  r=1 

r= 1 

MR  R 

(m)i  =  2m2c*  ius  (r)2c>Ju>^ 

r=  1  s=l  t= 1 

(m).  =  c[Mc; 

(7) 

It  is  fortuitous,  as  it  turns  out,  that  the  means  of  the 
transformed  vectors  are  of  a  quadratic  form.  This 


provides  a  unique,  optimum  solution  for  the  cu 
coefficients.  The  extrema  for  the  means  of  the  y, 
vectors  are  achieved  when  the  c,  are  the  eigenvectors  of 
the  M  matrix  of  equation  (7).  In  similar  fashion,  an 
orthonormal  transformation  matrix  D  can  be  found  for 
the  v,  vectors  such  that 

xi=2duyj  (g) 

j 

DDT  =  DTD  =  l 

and 

(">,  =  <Nd,  (9) 

where  the  d,  are  the  eigenvectors  of  the  N  matrix. 

When  TSVD  is  applied  only' to  the  principal  elements 
of  the  SVD  of  a  matrix,  the  resulting  series  pertains  to 
the  prominent  amplitude  distribution  of  the  original 
matrix. 

The  following  summarizes  the  basis  decomposition 
methods  —  SVD  and  the  new  TSVD  method: 


IV.  Example  and  Discussion 

This  example  presents  a  simple  illustration  of  the  new 
TSVD  method  for  decomposing  a  matrix  and  extracting 
the  principle  features.  The  signal  of  interest  is  a  series  of 
five  equal  amplitude  sin  bursts,  each  at  a  separate 
frequency,  namely,  20,  15,  10,5,  and  25  normalized 
frequency  units.  Figure  1  presents  the  time  series  for  this 
signal.  Figure  2  is  the  spectrogram  for  the  signal.  Notice 
the  distinct,  equal  amplitude  energy  density  highlights  at 
the  appropriate  time-frequency  locations  in  the  time- 
frequency  plane.  An  SVD  was  performed  on  the 
spectrogram  matrix.  Figures  3  and  4  present  the  first  five 
(principal)  u,  and  v,  vectors  respectively.  The  u,  associate 
with  time  and  the  v,  associate  with  frequency.  Note  that  no 
one  u,  vector  can  be  attributed  to  a  particular  sin  burst. 
Likewise,  no  one  v,  vector  can  be  attributed  to  a  given 
frequency  for  a  particular  sin  burst.  Figure  5  and  6  are  the 
rotated  basis  vectors  y,  and  x,  that  associate  with  time  and 
frequency  respectively.  It  is  easy  to  see  form  figures  5  and 
6  that  the  new  basis  vector  set,  the  y,  and  x,  vectors, 
derived  by  forming  linear  combinations  of  the  SVD  basis 


Matrix  Decomposition  by  SVD  Method : 
A  =  USVT 

Matrix  Decomposition  by  TSVD  Method. 
U  =  XCT 
V=YDT 
Therefore, 

A  =  (XCt)S(YDt)t 
A  =  X(CtSDt)Y 
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vectors  u,  and  v; ,  associate  directly  with  the  respective 
time  and  frequency  locations  of  the  energy  density  for 
the  sin  burst  depicted  in  figure  2. 

Since  the  coefficients  for  finding  these  rotated  basis 
sets  are  derived  from  the  directions  of  the  extrema  of 
the  respective  means  of  the  y,  and  x,  vectors,  they  are 
optimum  in  that  sense.  Further  more,  the  salient 


optimum  principal  (principal  from  SVD  and  optimum 
from  new  method)  features  can  be  directly  obtained  from 
the  y,  and  x,  vectors  viz.  the  location  in  time  and 
frequency  and  the  spread  in  time  and  frequency  for  each 
energy  density  that  associates  with  a  particular  sin  burst. 


Frequency 

Figure  4  Singular  Frequency  Vectors  from 
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Figure  2  -  Spectrogram 


Figure  5  Rotated  Time  Vector  Basis  Set  via 
New  Matrix  Decomposition 
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Figure  3  Singular  Time  Vectors  from  SVD 


Frequency 

Figure  6  Rotated  Frequency  Vector  Basis  Set 
via  New  Matrix  Decomposition 
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ABSTRACT 

We  present  an  approach  to  designing  discrete  time- 
frequency  distributions  that  are  extremely  localized 
in  the  time-frequency  plane.  These  distributions, 
which  satisfy  the  marginals,  are  constructed 
recursively  by  transferring  energy  among  the 
points  in  the  time-frequency  distribution  (TFD)  in 
a  direction  which  decreases  the  entropy  of  the 
TFD.  This  transfer  is  such  that  the  resulting  TFD 
continues  to  satisfy  the  marginal  constraints. 

1.  INTRODUCTION 

Entropy  in  general  and  maximum  entropy  in 
particular  has  long  been  used  in  constructing  power 
spectral  densities  [1].  A  popular  approach  in  the 
spectral  analysis  of  stationary  signals  is  to  find  the 
spectral  density  that  maximizes  an  entropy-based 
criterion  while  satisfying  autocorrelation  matching 
constraints  [1].  The  result  is  the  spectrum  of  an 
autoregressive  filter  whose  coefficients  are 
obtained  by  solving  a  linear  set  of  equations.  (It  is 
important  to  note  that  the  result  depends  greatly 
on  the  definition  of  entropy  [2].)  Proponents  of 
maximum  entropy  have  long  argued  that  the 
resulting  spectrum  is  “obtained  while  making  the 
fewest  assumptions  about  the  signal”  and  “the 
flattest  spectrum  which  satisfies  the  constraints” 
[3].  Maximum  entropy  also  became  synonymous 
with  high  resolution  spectral  estimation  due  to  the 
nature  of  its  results  in  the  stationary  case:  peaky 
spectra  based  on  all-pole  filters.  However,  it  is 
often  difficult  to  reconcile  the  notion  of  “flattest 
possible  spectrum”  with  that  of  a  “high  resolution 
spectrum”  [3].  In  time-frequency  (TF)  analysis, 
maximum  entropy  has  been  used  to  generate  time- 
frequency  distributions  (TFD)  which  satisfy 
marginal  constraints  [5].  It  has  also  been  used  as  a 
criterion  for  deblurring  an  initial  TFD  and  for 
estimating  the  evolutionary  spectrum  [4]. 


In  time-frequency  analysis,  a  goal  is  often  to 
produce  distributions  that  localize  the  energy 
density  in  the  TF  plane.  Such  goals  seem  in  direct 
contradiction  with  the  notion  of  “flattest  possible 
distribution”  often  associated  with  maximum 
entropy.  In  fact,  this  goal  is  more  in  alignment 
with  the  concept  of  a  minimum  entropy 
distribution.  In  this  paper,  we  present  a  method  to 
construct  minimum  entropy  discrete  TFDs  that 
satisfy  marginal  constraints.  We  demonstrate  that 
this  method  is  guaranteed  to  converge  to  a  local 
minimum  in  a  finite  number  of  steps.  The 
resulting  TFDs  are  highly  localized  as  measured  by 
the  decrease  in  entropy  and  the  increase  in  the 
number  of  zero  values  in  the  TF  plane. 

2.  APPROACH 

The  proposed  algorithm  starts  with  a  TFD,  P(n,w), 
that  satisfies  the  marginals.  For  example,  for  a 
unit  energy  signal  x(n ) ,  one  can  use  the 
correlationless  distribution 

P(n,u)=\x(nf\X(uf.  (1) 

In  fact,  this  distribution  is  the  one  which 
maximizes  the  entropy  while  satisfying  the 
marginals.  While  it  is  a  valid  TFD,  it  produces 
little  information  beyond  the  marginals.  The 
proposed  algorithm  can  start  with  any  positive 
distribution  that  satisfies  the  marginals  (e.g.,  see 
[4,5]). 

2.1  Basic  principle 

The  basic  principle  behind  the  proposed  method  is 
to  modify  the  initial  TFD  in  a  direction  that 
decreases  the  entropy  while  not  disturbing  the  time 
and  frequency  marginals.  For  this  purpose,  we 
select  four  points  which  form  the  comers  of  a 
rectangular  grid  in  the  T-F  plane.  Note  that  these 
points  do  not  have  to  be  adjacent,  they  merely 
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have  to  form  the  comers  of  a  rectangle.  An 
example  is  shown  in  Figure  1  where  the  four  points 
are  labeled /?,  ,,/>,  2, />2  ,,p2  2 . 


nonnegative.  Below  we  find  the  optimal  value  of 
A  over  this  range. 

2.2  Optimal  Choice  for  A 


Figure  1.  An  example  of  a  rectangular  grid  in  the 
T-F  plane 


In  the  figure,  the  points  pu,/?12  occur  at  time  tl 
while  the  points  p2x,p22  occur  at  time  t2.  (tl  and 
t2  need  not  be  consecutive  in  value).  Also  px ,,  p2 , 
occur  at  frequency  wl  while  p[  V  p2  2  occur  at 

frequency  w2  (w  1  and  w2  need  not  be  consecutive 
either). 

The  proposed  method  is  to  adjust  the  TFD  by 
subtracting  a  value  A  from  pxl,  p2  2  and  adding  the 

same  value  to  p12,p2,i>  thereby  creating  the 
following  new  set  of  values 
(Pi,i  ~A),(Pij2  +  A),(/?2 ,  +A),(/?2  2  -  A) .  It  is 
important  to  note  that  along  tl,  for  example,  A  is 
subtracted  from  p, ,  and  added  to  pl2. 

Consequently,  the  sum  of  all  the  values  along  tl 
(the  time  marginal  at  tl)  does  not  change. 
Therefore,  if  the  initial  distribution  satisfies  the 
marginal  at  tl,  it  will  continue  to  do  so.  It  is  easy 
to  show  that  the  same  is  true  for  t2,  as  well  as  for 
wl  and  w2.  Since  no  other  values  in  the  TFD  are 
disturbed,  the  modified  TFD  will  have  the  same 
marginals  as  the  initial  one. 

The  remaining  obstacle  is  to  choose  the  optimal 
value  of  A  which  will  minimize  the  entropy  of  the 
resulting  TFD.  Since  we  limit  ourselves  to 
nonnegative  TFDs,  the  value  of  A  is  limited  from 
above  by  the  minimum  of  p, ,,  p2  2  and  from  below 
by  the  negative  of  the  minimum  of  p1  2,p2l.  This 
guarantees  that  the  new  values 
(Pi,i -  A),(  A, 2  +  A),(p2 1  +A),(/?2  2  -  A)  are 


Let  the  entropy  of  a  TFD  P(n, to)  be  given  by 
E  =  J  P{  n,co  )ln  P(  n,u> ) 


We  can  then  define  the  change  in  entropy  which 
occurs  due  to  the  modification  of  the  four  TFD 
points  described  above  as 


Loss  -  Ehefgre  Eafter 


(2) 


We  would  then  choose  A  to  maximize  the  entropy 
loss.  The  Loss  can  be  written  in  terms  of  only  the 
four  points  which  are  modified  since  the 
contribution  of  the  other  points  to  the  entropy  is 
the  same  before  and  after  the  modification.  In 
other  words,  (2)  can  be  rewritten  as 

Loss  =  -p\  \  lnpj  i~  p12  In  Pi  2 

-P2,iln/>2,i  -Pi,2lnP2,2 

+(Pi.i  ~A)ln(  pM -A) 

+Oi,2  +  A)ln(/?,  2  +  A)  ^3) 

+(p2  l  +  A)ln(p2 ,  +  A) 

Hp2,2  ~  A)ln(fti2  -  A) 

All  that  remains  is  to  maximize  the  above  equation 
with  respect  to  A  .  (Note  that  the  first  four  terms 
do  no  depend  on  A  .  Consequently,  they  can  be 
ignored  during  the  maximization).  Taking  the 
derivative  of  the  Loss  with  respect  to  A  and 
setting  it  to  zero  yields: 

0  =  -ln(/?|  i  -A)  +  ln(p12+A) 

+  In(/?2 ,  +  A) -In (p22  -A) 


The  above  equation  is  easily  solved  for  A  to  yield 

a  =  £hi£n  ~  PjjjP^ i  ^ 

P\,  1  Pi, 2  Pl,l  Pi, 2 


The  second  derivative  of  the  loss  function  with 
respect  to  A  is  given  by 


d2Loss  1  1 

3A  P\,\  “A  P \,i  +  A 

1  1 
+ - + - 

Pi,  i  +  A  Pi  i  ~  A 


(6) 


604 


This  quantity  is  strictly  positive  for  the  range  of 
A  over  which  the  optimization  is  performed. 
Unfortunately  this  implies  that  the  critical  value  of 
A  in  (5)  minimizes  the  Loss  in  (3)  instead  of 
maximizing  it.  This  also  means  that  the  entropy 
loss  function  in  (3)  is  convex  in  A  and,  to 
maximize  it,  one  has  to  choose  one  of  the  extreme 
values  of  A:  the  minimum  of  pxx, p22  or  the 
negative  of  the  minimum  of  px2,p2  , .  The  choice 
is  made  by  evaluating  the  loss  function  in  (3)  for 
the  two  possible  values  of  A  and  choosing  the  one 
which  causes  the  larger  loss. 

It  is  interesting  to  note  that  after  the  modification, 
one  of  the  four  points  will  become  identically  zero. 
Consequently,  the  approach  reduces  the  number  of 
nonzero  values  in  the  time-frequency  plane,  i.e.,  it 
concentrates  the  energy  distribution  into  fewer 
time-frequency  points,  thereby  achieving  higher 
localization. 

It  is  also  interesting  to  note  that,  if  our  goal  were 
to  maximize  the  entropy  of  the  resulting  TFD,  we 
could  then  use  the  A  in  (5)  to  update  the  values  in 
the  TFD.  Moreover,  if  we  start  with  a  maximum 
entropy  TFD  (using  (1)),  it  is  easy  to  show  that 
the  A  in  (5)  is  identically  zero.  This  is  intuitively 
appealing  since  we  cannot  expect  to  increase  the 
entropy  of  a  TFD  that  is  already  maximum 
entropy. 

3.  ALGORITHM 

In  this  section  we  describe  the  algorithm  that 
implements  the  minimum  entropy  approach 
described  above  and  comment  on  its  performance 
and  limitations. 

3.1  Algorithm  Steps 

1.  Choose  a  pair  of  points  p,„/>22  in  the  T-F 

plane.  This  will  define  a  rectangle  as  long  as  the 
two  points  do  not  occur  at  the  same  time  or  at  the 
same  frequency.  Consequently  the  choice  of 
Pi, „ p2, 2  defines  the  set  pxx, pia,p2vP2,2- 

2.  Evaluate  the  loss  function  in  (3)  for  the  two 
possible  values  of  A:  xmn{pxx,p22}  or 
-min{  /?,  2,  p2  l } .  Choose  the  value  that  results  in  a 
larger  loss. 

3.  Update  the  set  px  x, pX2,p2X,p22.xvsmg  the  A 
from  step  2  to  create  the  set  of  points 
(Pi  i  — A )XPi,2  A),(p2ii  +A),(/?22  —  A) . 


4.  If  there  is  no  remaining  pair  of  points  that  will 
reduce  the  entropy,  stop.  Otherwise  return  to  step 
1. 

3.2  Comments 

The  algorithm  above  is  computationally  simple 
and  requires  very  few  operations  at  each  step. 
Unfortunately,  the  number  of  steps,  i.e.,  the 
number  of  possible  pair  of  points  in  the  T-F  plane, 
is  very  large  (on  the  order  of  N4,  assuming  N 
points  in  time  and  N  points  in  frequency).  This 
computational  cost  makes  the  algorithm  very 
cumbersome.  On  the  other  hand,  the  algorithm  is 
easily  parallelized. 

An  important  factor  that  affects  the  resulting  TFD 
is  the  order  by  which  the  pairs  of  points  are  chosen 
in  step  1  of  the  algorithm.  Each  ordering  produces 
a  different  result.  Consequently,  care  should  be 
taken  in  choosing  the  sequence  of  pairs.  One 
logical  choice  is  the  pair  which  achieves  the 
greatest  reduction  in  entropy.  However  this  choice 
involves  a  search  through  all  the  possible  choices 
which  increases  the  computational  burden  of  the 
algorithm. 

4.  RESULTS 

To  test  the  algorithm,  we  use  a  chirp  of  length 
N=32  given  by  the  following  equation 

x(n)=celm'lam 

where  c  is  a  normalization  constant  to  make  the 
signal  unit-energy.  Figure  2  shows  the  real  part  of 
the  signal.  Figure  3  shows  the  initial  TFD 
(calculated  using  (1))  which  satisfies  the  marginals 
and  has  maximum  entropy.  Figure  4  shows  the 
minimum  entropy  TFD  calculated  using  the 
proposed  algorithm.  The  final  TFD  has  entropy 
3.8  compared  to  the  intial  TFD’s  entropy  of  6.4. 
Moreover,  the  final  TFD  concentrated  the  energy 
in  209  nonzero  time-frequency  points  compared  to 
1024  for  the  initial  TFD.  As  expected,  the  final 
TFD  also  satisfies  the  marginals. 

It  is  important  to  remember  that  the  algorithm 
converges  to  a  local  minimum  of  the  entropy 
depending  on  the  sequence  of  points  chosen  in  step 
1  above.  No  effort  was  made  in  this  example  to 
optimize  the  sequence. 

While  the  appearance  of  the  final  TFD  may  not  be 
pleasing  to  some  readers,  it  is  nonetheless  a  valid 
TFD  which  satisfies  the  marginals  and  is  highly 
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localized  in  time-frequency.  Also  the  final 
appearance  depends  greatly  on  the  intial  TFD.  If, 
for  example,  we  start  with  a  TFD  concentrated 
along  the  instantaneous  frequency  (IF),  the  final 
result  will  be  a  more  localized  TFD  also 
concentrated  along  the  IF.  One  may  also  consider 
additional  constraints  to  impose  on  the  resulting 
TFD  on  top  of  the  marginal  constraints.  However 
these  additional  constraints  may  require  the 
modification  of  the  algorithm  presented  in  Section 
3. 


5.  CONCLUSIONS 

We  presented  an  algorithm  to  calculate  a 
minimum-entropy  TFD  that  satisfies  the 
marginals.  The  resulting  TFDs  are  highly  localized 
in  the  TF  plane. 


Initial  Maximum  Entropy  TFD 
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Real  Part  of  Test  Signal 


Figure  3  Initial  maximum-entropy  TFD 

Final  Minimum  Entropy  TFD 


Figure  4  Final  minimum-entropy  TFD 


Figure  2  Real  part  of  test  signal 
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ABSTRACT 

Is  there  a  limit  to  the  maximum  resolution  one  can 
achieve  when  representing  the  signal’s  energy  in  the  Time- 
Frequency  plane?  Some  authors  sustain  that  such  a  limit 
exists,  and  ignoring  it  is  the  cause  of  the  known  difficul¬ 
ties  with  some  joint  Time-Frequency  distributions;  others 
maintain  that  there  is  no  such  limit. 

In  this  article,  we  propose  to  analyze  the  merits  and  de¬ 
merits  of  the  several  existing  approaches,  and  suggest  fur¬ 
ther  arguments  one  might  wish  to  consider.  This  will  take 
us  to  the  conclusion  that,  both  from  a  tool-specific  and 
from  a  general  information-theoretic  point  of  view,  there 
is,  indeed,  a  lower  limit  on  the  achievable  resolution,  even 
though  the  expression  for  that  limit  can  not  be  given  by  the 
traditional  Heisenberg-Gabor  relations. 

1.  INTRODUCTION 

The  implications  of  the  so  called  ’’Principle  of  Uncertainty” 
in  the  signal  processing  field  have  first  been  recognized 
by  Gabor  [1],  who  introduced  a  time-frequency  version  of 
Heisenberg’s  inequality: 

(1) 

where  at  and  a/  axe  the  time  and  frequency  standard  devia¬ 
tions,  respectively.  It  is  often  assumed  that  this  relation  im¬ 
plies  the  existence  of  a  maximum  possible  resolution  within 
the  time-frequency  plane.  It  does  not,  as  will  be  seen,  and 
has  often  been  pointed  out  [2].  But  is  there  such  a  limit, 
even  if  (1)  fails  to  address  it?  Answering  this  question  is 
the  goal  of  this  article.  We  will  approach  the  objective  in 
steps.  We  will  start  by  looking  at  the  existing  uncertainty 
relations  and  other  traditional  approaches.  After  having 
established  their  shortcomings,  we  will  propose  a  different 
and  more  general  approach,  one  which  will  lead  us  to  the 
desired  answer. 

2.  THE  MATHEMATICAL  RELATION 

As  a  mathematical  relation,  (1)  is  a  very  simple  statement, 
concerning  the  standard  deviations  of  a  function  and  its 

This  work  was  supported  by  the  Ministerio  da  Defesa  Na¬ 
tional  and  Fundagao  das  Universidades  Portuguesas,  subprogram 
”Os  Oceanos  e  as  suas  Margens”. 


Fourier  Transform.  As  a  matter  of  fact,  it  is  not  the  best 
one,  since  stronger  statements  can  be  made  [2]: 


(jtaf-  hsj\+Cov2'  (2) 

where  Cov  is  the  covariance  of  the  signal,  defined  in  [2]. 
Since  Cov 2  is  necessarily  a  non-negative  quantity,  (2)  is 
stronger  than  (1). 

Both  relations  (1)  and  (2)  must,  however,  be  taken  care¬ 
fully,  since  their  misinterpretation  and  misuse  is  responsi¬ 
ble  for  many  false  common  notions.  •  Firstly,  we  note  that 
they  fail  to  express  what  one  usually  feels  to  be  the  uncer¬ 
tainty  principle.  Because  they  use  standard  deviations  as 
measures  of  spread,  they  do  not  prohibit  the  existence  of 
functions  arbitrarily  narrow  in  both  time  and  frequency  do¬ 
mains.  In  fact,  it  is  always  possible  to  devise  a  function  ar¬ 
bitrarily  narrow  (in  the  sense  that  its  energy  can  be  made  to 
be  arbitrarily  concentrated)  and  whose  standard  deviation 
is  greater  than  any  given  value  [3].  •  Another  inadequacy 
of  the  use  of  standard  deviations  (or  any  other  measure  of 
global  width)  appears  when  the  signal  s(t)  (or  its  spectrum 
S(/))  is  not  unimodal  [3].  If,  for  example,  we  think  of  a 
finite  segment  of  a  two-tone  signal,  we  will  easily  conclude 
that  the  standard  deviation  of  S{f)  is,  for  any  reasonably 
long  observation  time,  almost  independent  of  the  length  of 
the  observation  period.  Instead,  it  depends  strongly  on  the 
individual  frequencies  of  the  two  tones.  Once  again,  (1)  and 
(2)  fail  to  express  what  we  intuitively  feel  the  uncertainty 
principle  to  be  in  this  particular  case:  an  inverse  relation 
between  the  duration  of  the  observed  segment,  and  the  (lo¬ 
cal)  width  of  the  main  lobes  of  its  power  spectrum.  •  As 
a  last  comment,  we  will  note  that  these  relations  fail  to 
express  reciprocity  between  both  domains  (an  increase  in 
at  does  not  necessarily  imply  a  corresponding  decrease  in 
a/),  a  fact  which  has  often  been  pointed  out  as  one  of  the 
limitations  of  their  quantum  mechanical  counterparts  (e.g. 

[4])- 

Attempts  have  been  made  to  obtain  alternative  measures 
of  width  in  time  and  frequency,  that  might  better  express 
the  concept  of  narrowness.  One  of  the  most  successful  ap¬ 
proaches  was  the  one  of  Slepian-Landau-Pollack  who,  in  a 
series  of  papers  (e.g.  [5]),  showed  that  it  is  not  possible  to 
design  a  function  with  arbitrary  simultaneous  energy  con¬ 
centration  in  arbitrarily  small  regions  of  time  and  frequency. 
The  particular  limits  on  energy  concentration  depend  on 
the  desired  time-bandwidth  product  [6],  but  the  important 
point  is  that  they  exist.  Let  us  just  note  that  this  broadness 
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measure  also  fails  to  express  what  one  feels  to  be  the  un¬ 
certainty  principle  in  the  case  of  non-unimodal  signals  (or 
with  non-unimodal  spectra). 

We  will  use  a  different  definition  of  broadness,  based  on 
the  Hessian  of  the  power  spectrum.  Define  the  broadness 
(Bp)  of  a  power  spectrum  P(f)  —  IS^/)!2  as: 


BP  =  l/]jMax(--^P(f)y  (3) 

When  applied  to  the  frequency  domain,  this  type  of  mea¬ 
sure  relates  directly  to  the  idea  of  frequency  resolution.  As 
can  be  expected  and  is  easily  shown,  it  also  obeys  an  "un¬ 
certainty  relation” : 


Bp  ■  ctr>  -L,  (4) 

where  a2R  =  r2  |R(r)|  dr,  and  R(r)  =  s(t)s*(t  - 
t )  dt.  It  is,  however,  a  relation  between  the  broadness  of  the 
fine  structure  in  one  domain  and  a  global  width  (standard 
deviation)  in  the  other.  It  thus  conveys  the  general  feel¬ 
ing  of  what  the  "uncertainty  principle”  is,  better  than  the 
relations  between  the  global  widths  of  both  domains,  such 
as  (1).  Namely,  it  can  be  used  with  non-unimodal  spectra 
such  as  the  two-tone  signal  previously  considered.  There  are 
other  interesting  characteristics  in  (3).  One  of  them  is  that 
the  fine  structure  of  the  spectrum  is  seen  to  be  related  to 
the  overall  width  of  the  autocorrelation  function,  and  not 
to  the  time  duration  of  the  signal,  a  point  which  we  will 
soon  explore.  Another  interesting  fact  is  that  it  possesses 
a  counting  property  (similar  to  the  ones  investigated  in  [7] 
and  [8]  for  the  Re'nyi  entropy),  an  useful  attribute  that  will 
not  be  explored  here.  Equivalent  relations  can,  of  course, 
be  obtained  for  the  complementary  domain  (fine  structure 
in  time  vs  spectral  overall  width).  We  may  now  question  if 
there  is  any  uncertainty  relation  between  the  fine  structure 
in  one  domain  and  the  fine  structure  in  the  other.  There  is 
not,  as  can  easily  be  shown. 

Other  definitions  of  broadness  can  be  devised.  An  inter¬ 
esting  one  (which  we  will  not  pursue  here)  is: 


Br 


8= 

-W 


(f-aor(f-VP(f  +  1$)df) 


(5) 


since,  with  this  definition,  Br  -  or  =  1/27 r.  This  measure 
of  broadness  thus  creates  reciprocity  between  the  widths 
in  the  dual  domains,  something  which  (1)  or  (2)  can  not 
achieve,  as  already  mentioned. 


3.  THE  PHYSICAL  PRINCIPLE 


where  (•)  stands  for  average.  This  apparent  generalization 
(its  role  as  a  generalization  has  been  severely  criticized  -  see, 
for  example  [10])  of  (1)  does  not,  however,  shed  much  light 
on  the  underlying  physical  principle,  since  it  relies  too  heav¬ 
ily  on  the  mathematical  formalism  of  operators,  a  technique 
of  which  Nature  knows  nothing.  Uncertainty  relations  have 
been  obtained  in  physical  areas  where  the  mathematical  for¬ 
malism  of  operators  can  not  even  be  applied  (e.g.  [11]).  One 
must  think  of  the  uncertainty  principle  as  a  consequence  of 
the  eigenfunctions  of  the  chosen  operators  (and,  thus,  of 
the  definition  of  the  "pure”  representatives  of  the  physical 
quantities).  In  the  time- frequency  case,  there  will  be  uncer¬ 
tainty  if  the  eigenfunctions  corresponding  to  the  concept  of 
"frequency”  do  not  have  time  localization  properties.  The 
existence  of  uncertainty  between  the  dual  domains  is,  thus, 
not  an  exclusive  of  the  Fourier  Transform.  As  an  exam¬ 
ple  of  that,  let  us  choose  constant  amplitude  chirps  as  the 
"pure”  representatives  of  the  frequency  concept  (assuming 
that  there  is  any  sense  whatsoever  in  doing  this).  Formally, 
this  would  correspond  to  define  ’’frequency”  as  the  eigen¬ 
values  of  the  operator  T,  where: 


the  eigenfunctions  being  e-’(Q°f2+2,r-f<\  The  "frequency” 
eigenfunctions  have,  thus,  no  time  localization  capabilities. 
Is  there  an  associated  uncertainty  principle  between  time 
and  ’’frequency”?  Yes.  As  is  easily  shown,  the  mathemati¬ 
cal  relation  is  again  (1).  Instead  of  linear  chirps,  we  could 
do  the  same  for  quadratic  chirps,  or  constant  amplitude 
functions  of  any  polynomial  phase  law  (and,  thus,  extend¬ 
able  to  more  general  signals  via  the  Weierstrass  theorem), 
and  the  conclusions  will  be  the  same.  This  means,  namely, 
that  uncertainty  relations  will  exist  between  time  and  "fre¬ 
quency”  for  the  fractional  Fourier  Transform  with  any  angle 
of  rotation  in  the  time-frequency  plane. 

Avoiding  uncertainty  relations  implies  an  acceptable  re¬ 
definition  of  the  involved  concepts.  If,  for  instance,  one 
is  willing  to  redefine  frequency ,  accepting  the  concept  of 
local  frequency  as  a  suitable  physical  quantity,  and  thus  us¬ 
ing  time  localized  waves  as  representatives  of  the  ’’pure” 
concept,  then  the  uncertainty  relations  are  easily  made  to 
collapse.  Let  us  use,  as  an  example,  cisoids  with  a  Gaussian 
envelope: 

<p(t,f)=e~^ej2nft. 

These  "pure”  functions  are  the  eigenfunctions  of  the  non- 
Hermitian  (and  non-linear)  operator 


An  attempt  to  grab  the  physical  principle  behind  these  dif¬ 
ferent  mathematical  manifestations  was  made  by  Robertson 
[9]  with  a  simple  but  apparently  far-reaching  statement: 
there  will  be  uncertainty  between  two  variables  (observ¬ 
ables)  whenever  their  operators  don’t  commute.  Mathe¬ 
matically,  if  A  and  B  are  the  operators  associated  with  the 
variables,  Robertson’s  inequality  states  that: 

a2t-a)>\\(AB-BA)\\ 


whose  (continuous)  eigenvalues  are  the  values  of  /,  thus 
carrying  with  them  the  concept  of  local  frequency.  Let  us 
now  consider  a  signal  whose  (local)  frequency  spectrum  is 

~A 

S(f)=e  “U. 

That  is: 


s(t) 


-  s: 


S(f)  df 
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Denoting  the  variance  of  s(t)  by  of,  we  have  that  [12]: 

2  _  _ 1 _ 

&t  /  \  • 

{■h  +  16*S) 

For  any  given  crj,  we  can  now  arbitrarily  diminish  the  time 
width  by  simple  decreasing  <rf.  The  uncertainty  relations 
have,  in  fact,  collapsed.  In  the  limit  of  no  time  localiza¬ 
tion  capabilities  on  the  frequency  concept  (a'i  — »  oo),  these 
relations  reduce  back  to  (1),  as  they  had  to. 

4.  THE  TIME-FREQUENCY  PLANE 

Inequalities  like  (1)  and  (2)  impose  restrictions  on  the  si¬ 
multaneous  behavior  of  a  time  function  and  its  Fourier 
transform.  But  imposing  conditions  on  the  time  and  fre¬ 
quency  marginals  doesn’t  really  tell  us  much  about  what  is 
or  is  not  achievable  within  the  time  frequency  plane  [13]. 
As  an  example,  consider  the  time-frequency  representation 
p(t,f)  =  5(f  —  kt).  Obtaining  such  a  time-frequency  rep¬ 
resentation  implies  having  infinite  local  time-frequency  res¬ 
olution.  And  yet,  for  any  given  of,  one  can  force  the  fre¬ 
quency  marginal  to  be  arbitrarily  wide  by  simply  increasing 
the  chirp  rate.  Hence,  (1)  does  not,  in  fact,  constitute  a 
limit  to  the  achievable  time-frequency  resolution  within  the 
time-frequency  plane.  This  is  also  true  for  all  uncertainty 
relations  discussed  so  far. 

To  determine  the  limits  of  joint  time-frequency  descrip¬ 
tions,  we  need  to  abandon  the  marginals,  with  their  global 
time  and  Fourier  descriptions,  and  move  into  the  plane. 
Some  attempts  have  been  made,  mainly  using  the  Wigner- 
Ville  Distribution  (or,  more  generally,  the  Cohen  class  of 
distributions)  as  a  joint  energy  description  of  the  signal.  A 
very  good  summary  of  these  can  be  found  in  [6].  However, 
careful  reasoning  (the  analysis  of  each  one  of  these  propos¬ 
als  can  not  be  done  here,  for  space  reasons)  will  show  that 
none  of  them  addresses  properly  the  issue  of  time-frequency 
concentration.  A  more  promising  approach  is  the  extension 
of  the  Slepian- Pollack-Landau  energy  concentration  mea¬ 
sure  to  ellipsoidal  regions  with  axis  parallel  to  the  time  and 
frequency  axis  [6].  Unfortunately,  the  results  are  specific 
to  the  bilinear  class  and,  furthermore,  this  particular  shape 
of  the  concentration  region  fails  to  answer  the  question  in 
cases  with  spectral  dynamics. 

Other  approaches  have  been  made,  considering  condi¬ 
tional  (and,  thus,  local)  moments  (e.g.  [2],  [14]).  They 
concluded  that  (1)  does  not  constrain  the  product  of  the 
local  moments.  Not  being  limited  by  (1)  is,  however,  differ¬ 
ent  from  not  being  limited  at  all.  That  approach  is,  thus, 
inconclusive. 


the  amount  of  information  one  must  collect  to  properly  es¬ 
timate  its  power  spectrum  (for  the  moment,  let  us  assume 
that  the  signal  possesses  no  spectral  dynamics).  Define 


where  E{P(f)}  is  the  power  spectrum  of  the  signal.  Note 
that  this  measure  of  complexity  is  very  similar  to  the  def¬ 
inition  of  Fisher  information,  and  bears  a  very  close  link 
with  the  notion  of  ’’narrowness”  of  the  power  spectrum. 

To  avoid  the  hassle  of  having  to  invoke  or  discard  er- 
godicity,  and  so  contribute  to  obscure  the  main  issue,  let 
us  assume  that  we  have  a  second  degree  ’’ensemble  view”. 
That  is:  by  observing  the  signal  at  time  ti,  we  immedi¬ 
ately  apprehend  the  value  of  E{s(ti)};  by  also  observing 
the  signal  at  time  f2,  we  now  not  only  apprehend  the  value 
of  £{s(t2)},but  also  R(h,t2)  =  E{s(ti)s*(f2)}  •  Since  we 
are,  under  this  assumption,  directly  observing  expected  val¬ 
ues  (and  not  mere  realizations  of  the  process),  we  may,  in 
what  follows,  safely  ignore  the  pratical  aspects  of  real  esti¬ 
mators,  such  as  bias  and  variance. 

•  Stationary  signals.  From  this  idealized  point  of  view, 
let  us  now  consider  the  estimation  of  the  power  spectrum 
of  a  stationary  signal.  To  perform  the  estimate,  one  must 
extract  information  out  of  the  signal.  But  how  much  ob¬ 
servation  time  do  we  need?  How  much  information  must 
we  collect?  At  first,  increased  observation  time  will  pro¬ 
vide  better  estimates,  in  a  process  converging  to  the  true 
power  spectrum.  But,  after  convergence  (assuming  it  ever 
happens),  will  further  increases  in  observation  time  provide 
more  information  about  the  power  spectrum?  It  clearly 
doesn’t.  Once  this  convergence  process  is  completed,  no 
further  observation  time  is  needed;  no  further  information 
is  required.  We  will  have  reached  the  signal’s  complexity 
Cp.  If  the  observation  time  is  less  than  the  time  to  reach 
complexity  ( tr ),  we  will  be  missing  part  of  the  informa¬ 
tion  needed  for  a  correct  estimate;  if  the  observation  time 
available  is  greater  than  tr,  the  last  part  of  the  signal  will 
be  informationless.  But  how  do  we  determine  tr?  The 
amount  of  information  needed  to  estimate  the  power  spec¬ 
trum  is  clearly  the  same  amount  of  information  needed  to 
estimate  its  inverse  Fourier  Transform,  the  autocorrelation 
function.  Hence,  we  only  need  to  observe  the  signal  for  the 
amount  of  time  needed  to  determine  all  (relevant)  lags  of 
its  autocorrelation  function.  The  time  to  reach  complex¬ 
ity  is  thus  the  time  support  of  the  autocorrelation  function. 
This  is  a  very  gratifying  conclusion,  since,  from  (3)  and  (4), 
the  spectral  complexity  (6)  and  the  time  support  of  the  au¬ 
tocorrelation  function  R[r)  are,  in  fact,  directly  related  to 
each  other  through  yet  another  ’’uncertainty  relation”: 


5.  SPECTRAL  COMPLEXITY 

To  properly  address  the  issue  of  joint  time-frequency  resolu¬ 
tion,  let  us  consider  it  from  the  more  general  point  of  view 
of  information  gathering.  Let  us  first  define  the  concept 
of  spectral  complexity  of  a  stochastic  signal  (Cp),  as  being 


Denoting  by  D(t)  the  density  of  information  contained  in 
the  signal,  and  by  Ip(t)  the  amount  of  collected  informa¬ 
tion,  the  collection  procedure  can  be  summarized  as  follows: 

ft  fto  +  TR 

IP(t)=  /  D(r)dr  <  /  D(0^  =  CP.  (8) 

J to  J  to 
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From  (6),  the  spectral  complexity  of  a  sinusoid  is  infinite. 
In  fact,  the  power  spectrum  of  a  pure  sinusoid  will  always 
become  narrower  with  increasing  observation  time,  without 
ever  stabilizing.  Complexity  will  never  be  reached  for  finite 
observation  times.  Coherently,  the  autocorrelation  function 
is  known  to  have  infinite  time  support.  Even  though  an  im¬ 
pulsive  spectrum  may  seem  simple,  we  must  note  that  the 
necessarily  perfect  localization  needed  for  a  proper  estimate 
does  require  collecting  an  infinite  amount  of  information. 
This  is  thus  the  high  end  of  spectral  complexity,  where  all 
observation  time  becomes  useful  and  brings  additional  in¬ 
formation. 

In  the  low  end,  we  have  white  noise.  An  instantaneous 
ensemble  observation  fully  characterizes  its  autocorrelation 
function  and,  hence,  its  very  low  complexity  (zero,  in  fact) 
power  spectrum.  Further  observation  of  the  noise  will  not 
contribute  with  any  new  information  concerning  its  power 
spectrum. 

In  the  general  case,  we  have  that  signals  with  narrow¬ 
band  components  (and,  thus,  of  high  spectral  complexity) 
require  a  high  collection  time  tr.  Signals  without  narrow- 
band  components  (hence,  of  low  spectral  complexity)  have 
small  collection  times.  In  any  case,  observing  the  ensemble 
for  periods  longer  than  tr  is  not  useful,  since  no  additional 
information  about  the  power  spectrum  will  be  obtained. 

•  Non-stationary  signals.  Assume  that  we  want  to  esti¬ 
mate  the  power  spectrum  of  a  non-stationary  signal  at  time 
<i.  This  spectrum  will  have  a  given  amount  of  spectral  com¬ 
plexity  ( Cp ),  and  to  properly  estimate  it,  we  need  to  collect 
this  very  same  amount  of  information  about  the  spectrum 
(or  the  autocorrelation  function)  at  time  t\.  But  to  repre¬ 
sent  time  ti,  all  we  have  is  s(ti)  itself,  and  no  finite  amount 
of  spectral  information  can  be  extracted  from  an  instan¬ 
taneous  value  of  the  signal.  Information  collected  at  times 
other  than  t\  will  only  be  useful  if  and  only  if  it  is  correlated 
with  the  spectral  information  at  time  1 1.  In  the  previous 
case  of  stationary  signals,  the  spectral  information  at  any 
time  was  totally  correlated  with  the  spectral  information 
at  any  other  time.  In  the  non-stationary  case,  however,  we 
must  weight  the  collected  information  with  the  non-unitary 
correlation  factor  (we  will  denote  it  by  utility  factor  -  u(t)) 
that  determines  how  useful  is  the  collected  information  for 
estimates  at  time  ti .  We  now  have  to  distinguish  between 
useful  past  and  future,  and  non-useful  past  and  future.  The 
collection  procedure  (8)  becomes,  using  superscripts  to  de¬ 
note  the  particular  time  for  which  the  spectrum  estimate  is 
intended: 


#(*) 

£(*) 


=  f 

Jtn 


D(t)  u(t  —  tl)d.T, 


<  cp 


D(e)u(t-  h)dc 


(9) 


This  utility  factor  is  thus  just  formalizing  the  fact  that 
observing  a  non-stationary  signal  now  does  not  tell  us  much 
concerning  the  spectrum  of  the  signal  a  fortnight  ago.  The 
exception  lies,  of  course,  in  the  case  of  signals  with  de¬ 
terministic  and  known  frequency  dynamics,  since  in  these 
cases  the  information  collected  at  any  time  can  always  be 
made  useful,  by  taking  the  dynamics  into  proper  account. 
Knowledge  of  the  frequency  dynamics  thus  makes  the  util¬ 


ity  factor  constant  and  unitary,  bringing  the  case  of  non- 
stationary  signals  to  the  very  same  situation  one  encounters 
with  stationary  signals. 

As  an  example,  consider  the  estimation  of  the  power  spec¬ 
trum  of  a  constant  amplitude  linearly  chirping  cisoid  with 
uniformly  distributed  random  phase: 

s(t)  =  eHa  t2+e). 

Its  autocorrelation  function  is  easily  seen  to  be: 

R(t,  t  -  t)  =  R(t,  r)  =  e"j(or2-Jor‘). 

To  determine  the  utility  factor,  we  can  now  determine 
how  correlated  are  the  autocorrelation  functions  at  different 
times  flrj(t2,ti),  <2  >  t\.  Due  to  the  infinite  energy  of 
these  autocorrelation  functions,  in  the  computation  of  their 
correlation  factor  we  will  limit  the  integration  region  to  an 
arbitrarily  large  region  centered  at  the  zeroth  lag.  That  is: 

«(<2-<i)  =  RR(t2,tl,l)  = 

=  Yl  J  R(t2>T)R*(tl  ,T)dT. 

This  means  that,  in  our  case, 


«(<2  -  fi)  = 


sin  (2al(t2  —  fi)) 
2al(t2  — 1\) 


(10) 


The  inclusion  of  u(t)  in  (9)  (in  this  case,  a  sine  function) 
will  limit  the  amount  of  collectable  information  relative  to 
time  ti  and,  thus,  will  upper  bound  the  achievable  spectral 
complexity  and,  hence,  the  achievable  frequency  resolution. 

This  is  thus  the  answer  we  have  been  trying  to  ob¬ 
tain.  There  are  limits  to  the  achievable  frequency  resolution 
within  the  Time-Frequency  plane,  due  to  the  fact  that  the 
period  of  time  during  which  one  can  collect  information  con¬ 
cerning  the  spectrum  at  a  given  time  is  diminishing  as  the 
spectral  dynamics  increases.  For  increasing  dynamics  (a, 
in  our  example)  the  useful  neighborhood  (and,  hence,  the 
amount  of  useful  information)  will  continuously  decrease, 
and  so  will  the  achievable  spectral  complexity.  This  namely 
means  that  the  faster  a  chirp  moves,  the  broader  it  becomes 
in  the  t-f  plane.  This  predicted  broadening  of  the  power 
spectrum  with  the  increase  of  the  chirping  rate  is,  in  fact, 
observed  in  many  bilinear  time-frequency  distributions  (e.g. 
Rihaczek,  Margenau-Hill,  Page,  etc.).  To  illustrate  it,  we 
computed  the  Margenau-Hill  distribution  of  a  cubic  chirp. 
The  results  can  be  seen  in  Figure  1. 

Let  us  now  try  to  determine,  in  the  case  of  our  chirp, 
what  is  the  best  observation  time.  From  (10),  we  see  that 
the  best  strategy  is  to  limit  the  observation  time  to  the 
main  lobe  of  the  sine  function.  That  is,  observe  the  signal 
between  <i  — r  and  t\+r,  where  r  =  n/2 al.  But  this  implies 
that  r  is  the  maximum  lag  of  the  observed  autocorrelation 
function.  That  is,  in  this  best  case,  l  =  r.  From  where  we 
conclude  that  the  best  observation  time  for  a  linear  chirp  is 


fi(t)  being  the  chirp  instantaneous  frequency  (in  this  par¬ 
ticular  case,  we  may  safely  identify  the  concept  of  instanta¬ 
neous  frequency  with  the  derivative  of  the  phase  function). 


610 


Figure  1:  Cubic  Chirp.  Margenau-Hill  distribution. 


With  hindsight,  it  is  now  interesting  to  observe  that  •  (11) 
was  already  known  to  be  the  optimum  observation  time  for 
short-time  Fourier  analysis  of  a  chirp  [15];  •  (11)  is  the 
effective  time  support  of  the  optimum  data  independent 
smoothing  window  to  use  with  the  Wigner-Ville  distribu¬ 
tion  [16];  •  it  is  also  basically  the  same  quantity  defined 
by  Rihaczek  as  the  signal’s  ’’relaxation  time”  [17].  These 
separate  results  can  now  easily  be  understood  as  particular 
manifestations  of  (11). 

A  last  comment  must  be  made,  concerning  the  use  of 
models.  Assuming  a  model  for  the  frequency  dynamics, 
such  as  the  linear  model  implicit  in  the  Wigner-Ville  Distri¬ 
bution  (or  higher  order  models  in  the  Polynomial  Wigner- 
Ville  Distribution),  is  an  attempt  to  increase  the  size  of 
what  we  called  the  ’’useful  neighborhood”,  by  projecting 
all  collected  spectral  information  to  the  time  of  interest,  a) 
If,  by  inspiration  or  mere  chance,  the  assumed  model  is,  in 
fact,  the  correct  one,  we  will  overcome  the  limits  imposed  by 
the  nonstationarity,  and  fall  within  the  traditional,  station¬ 
ary  uncertainty  relations,  as  previously  discussed;  b)  If,  on 
the  other  hand,  the  model  is  incorrect,  we  must  be  prepared 
to  pay  for  the  wrong  assumption.  We  will  have  apparently 
improved  our  frequency  resolution,  but  we  must  pay  for  it 
with  bias  and  artifacts.  Another,  more  subtle,  type  of  as¬ 
sumption,  is  made  whenever  we  arbitrarily  decide  that  the 
’’true”  distribution  is  the  one  maximizing  some  chosen  crite¬ 
ria.  It  may  or  may  not  be  a  sensible,  supported  assumption. 
It  is  an  assumption,  nonetheless.  It  may  buy  us  a  better 
frequency  resolution,  if  verified  by  the  signal  under  analysis. 
In  all  other  cases,  one  will  pay  for  the  apparently  increased 
time-frequency  resolution  in  bias  and  artifacts.  This  is,  be¬ 
sides,  exactly  the  type  of  trade-off  one  finds  in  all  types  of 
parametric  spectrum  estimation. 

6.  CONCLUSION 

In  this  article,  we  addressed  the  issue  of  determining  if  there 
are  lower  bounds  to  the  achievable  time-frequency  resolu¬ 
tion  within  the  Time-Frequency  plane.  After  the  analy¬ 
sis  of  existing  approaches,  we  proposed  an  alternative  one, 
based  on  the  informational  aspects  of  the  estimation,  in 
an  attempt  to  achieve  results  independent  of  the  specific 


tool  used  to  estimate  the  joint  power  spectrum.  We  con¬ 
cluded  that  there  are,  indeed,  limits  to  the  achievable  time- 
frequency  resolution.  These  limits  are  a  direct  consequence 
of  the  spectral  dynamics  of  the  signal.  Increasing  spectral 
dynamics  imply  decreasing  time-frequency  resolution  capa¬ 
bilities.  In  the  process  of  obtaining  these  limits,  we  also 
determined  the  optimum  observation  time  (which  also  de¬ 
pends  on  the  spectral  dynamics  of  the  signal),  concluding 
that  its  optimality  is  tool  independent.  This  conclusion  al¬ 
lows  an  unified  view  of  previously  reported  particular  cases 
where  this  observation  time  was  determined  to  be  optimum. 
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ABSTRACT 

A  new  method  for  frequency  tracking  is  presented.  It  combines 
high  resolution  time-frequency  analysis  with  the  discriminative 
power  of  a  multiple  signal  classification  (MUSIC,  see  [5])  tech¬ 
nique.  The  proposed  time-varying  frequency  estimator  employs  a 
signal  adaptive  time-frequency  distribution  (TFD)  with  suppressed 
interference  terms.  The  adaptive  TFD  is  able  to  resolve  compo¬ 
nents  that  are  closely  spaced  in  time  and  frequency.  The  achieved 
resolution  is  superior  to  the  one  achieved  by  sliding  window  tech¬ 
niques  which  are  hampered  in  joint  time-frequency  resolution  by 
the  uncertainty  principle.  Simulations  show  that  the  presented  ap¬ 
proach  operates  reliably  in  low  signal-to-noise  ratio  environments. 
The  proposed  method  improves  and  generalizes  the  method  pro¬ 
posed  in  [1], 

1.  INTRODUCTION 

The  frequency  tracking  problem  we  want  to  address  in  this  paper  is 
furnished  by  the  estimation  of  p  deterministic  frequency  contours 
fi(t)  =  2 from  agiven  noisy  observation  x(t): 

p 

x(t)  =  z(t)  +  22  »«(<)  with  Xi(t)  =  Xi  (l) 

t=i 

The  random  process  z(t)  is  assumed  to  be  zero  mean  complex 
white  Gaussian  noise  with  variance  <x2  =  E  {  z(t)  z"(t)  }.  In  this 
section  we  will  also  assume  that  the  phase  of  each  component 
is  an  independent  uniformly  distributed  random  variable  with  Qi  6 
[0, 2rr]  for  i  =  1 . . .  p.  We  assume  the  number  of  components  p  to 
be  known. 

Optimal  estimators1  for  /,  (()  are  mathematically  cumbersome 
and  thus  not  practical  (see  [4]).  Instead  we  will  employ  a  subop- 
timal  (but  practical)  approach  based  on  an  eigenspace  analysis  of . 
the  local  autocorrelation  function  Rxx(t,  r): 

Rxx(t,T)  =  E{x*(t-%)x(t+%)}  (2) 

=  a2z  S(r)  +  x?  eJ't'f’i(t+T/2)-*’<(t-r/2))  (3) 

i=i 

The  general  connection  between  the  signal  space  formed  by  the 
Xi(t)  and  the  eigenspace  of  Rxx(t,  r)  is  very  difficult  to  obtain. 
It  will  be  shown  in  section  3,  however,  that  there  exists  a  very 
simple  connection  if  we  restrict  ourselves  to  chirp  components  of 
the  form: 

<£i{t)  =  \ait2  +wit  (4) 

1  Like  a  maximum  likelihood  estimator  (MLE)  for  example. 


Using  these  chirp  components  we  can  simplify  the  resulting  local 
autocorrelation  function  Rxx(t,  r): 

Rxx  ( f ,  r)  =  <72  5(t)  +  22  X?  eJ'[Qi  t+  " i]r  (5) 

1=1 

We  will  describe  in  section  3  how  to  obtain  estimates  of  the  desired 
frequency  contours  /;(f)  based  on  an  estimate  of  Rxx(t,T).  The 
next  section  addresses  the  problem  of  finding  a  proper  estimate  for 
Rxx  (t ,  t). 

2.  ADAPTIVE  AUTOCORRELATION  ESTIMATION 

Finding  a  good  estimator  for  Rxx(t,  r)  is  not  a  trivial  task.  Un¬ 
fortunately,  the  obvious  choice  Rxx(t,  r)  =  x*(t  -  §)  x(t  +  § ) 
has  two  major  drawbacks:  1)  the  estimate  suffers  from  a  very  large 
variance  and  2)  the  resulting  autocorrelation  at  any  time  t  is  gen¬ 
erally  not  positive  semidefinite2  in  t.  Both  drawbacks  can  be  ad¬ 
dressed  by  proper  smoothing.  In  this  paper  we  consider  a  form  of 
smoothing  that  takes  the  special  structure  of  our  signal  x(t)  into 
account. 

We  gain  a  lot  of  mileage  by  considering  the  cross-ambiguity 
function3  AXiXk  (9,  r)  of  two  signal  components  x;(i)  and  xk(t): 

AXiXk  (9,t)  —  J  x*i(t-  %)xk(t  +  %)e3Stdt  (6) 

with  x((t)  =  X <  Qi  <2+“<  (7) 

Throughout  the  remainder  of  this  section  we  will  consider  a  spe¬ 
cific  realization  of  the  process  x(t).  This  implies  that  each  Qi  be¬ 
comes  a  fixed  number  and  z(t)  becomes  a  fixed  signal  that  resulted 
from  that  particular  realization.  The  overall  ambiguity  function 
Axx(9,  r )  of  this  particular  x(t )  can  be  written  as4: 

Axx  =  Azz  +  +  ^2;AXiZ  + 

VA  +TV  A  (8) 

^  2-^i  2^kj^i^xixk 

The  terms  Azz,  AZXi ,  and  AXiZ  establish  the  undesired  noise  terms. 
Note  that  we  can  assume  with  probability  one  that  the  magnitude 
of  each  of  these  three  terms  is  bounded.  It  is  also  possible  to  show 
that  the  cross-terms  AXiXk  for  k  /  i  are  bounded  for  all  (9,  r)  if 
Qi  t ^  a*;.  A  special  case  arises,  however,  if  ak  =  Qi.  Then  the 
resulting  term  becomes: 

I Ax.Xh  (9,  r)|  =  constant  •  5(9  +  cur  +  uk  -  (9) 

2This  is  obvious  from  the  fact  that  the  Wigner  distribution,  which  is  the 
Fourier  transform  of  Rxx(t,  t)  in  t,  generally  exhibits  negative  values, 
integration  is  always  assumed  to  be  over  [— oo,  +oo]. 

4The  dependency  on  (6,  r)  is  omitted.  Summations  are  over  1 . . .  p. 
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which  establishes  an  impulse  ridge  that  does  not  pass  through  the 
origin  (9,  t)  —  (0, 0)  of  the  ambiguity  domain.  In  fact,  the  only 
terms  that  produce  an  impulse  ridge  through  the  origin  of  the  am¬ 
biguity  domain  are  the  auto-components: 

|^.Xi(0,r)|  =  constant-  5(9  +  car)  (10) 

We  can  exploit  this  fact  in  order  to  construct  an  estimator  for  the 
location  of  the  auto-components  in  the  ambiguity  domain.  Con¬ 
sider  the  following  radial  integral: 

<2(0=  f  \AXx(r  cos£,r  sin0|e~r  /v  dr  (11) 

with  ft  being  an  arbitrary  finite  number.  If  Axx(6,  r)  would  not 
include  the  terms  AXiXi(6,r )  then  Q(0  would  always  attain  a 
finite  value,  since  all  terms  in  (8)  (except  the  auto-components 
AXiXi(0,  t))  are  bounded  or  have  impulse  ridges  away  from  the 
origin.  The  key  is  that  Q(£)  becomes  singular  if  and  only  if  we  are 
integrating  in  the  direction  of  an  impulse  ridge  that  resulted  from 
an  auto-term.  In  other  words,  we  have  Q(£i)  ->  oo  if  and  only  if 
c*i  =  -1/ tan£;. 

Since  we  will  not  be  able  to  deal  with  infinite  data  sets  in  prac¬ 
tical  implementations  of  this  algorithm  we  will  observe  significant 
spikes  in  Q(£)  instead  of  singularities.  An  example  can  be  seen  in 
figure  3. 

After  getting  the  q  spike  locations  &  we  can  construct  an  adap¬ 
tive  kernel  <t>a(0,  r)  that  supports  chirp  auto-terms  and  suppresses 
all  other  terms.  Note  that  q  denotes  the  number  of  different  chirp 
rates  in  the  signal  x(t)  and  not  the  number  of  components  p. 


In  discrete  time  we  obtain  the  following  set  of  equations  in 
analogy  to  the  equations  presented  in  section  1 : 

p 

x[n ]  =  z[n ]  +  ^2  xAn\  (15) 

i=l 

with  Xi[n]  =Xiei[2ain2+Uin+ei]  (16) 

Again,  z[ri\  is  zero  mean  complex  white  Gaussian  noise  with  vari¬ 
ance  cr?,  and  each  Qi  is  an  independent  random  variable  uniformly 
distributed  over  [0, 2zr]  for  i  =  1 . . .  p. 

Rxx[n,  fc]  =  E{x*[n-  \}x[n+  f  ]  }  (17) 

=  aU[k]  +  Yl  Xfei[ain+0Ji]k 
1  =  1 

We  can  arrange  2 M  4-  1  values  of  Rxx[n ,  k]  into  a  (M  +  1)  x 
(M  + 1)  matrix: 

Rxx\P")  0]  Rxx\r>i  1]  •  *  •  RXx\p>i  M\ 

U  r  1  _  Rxx\n,]\  ^xx[n,0]  ... 

mRxx\fl)  M]  ...  ...  Rxx\ti ,  0]  . 

A  vector  of  signal  components  can  be  defined  by 

Xi[n]  =  [l  eriQi  n+"i]  ...  eft«i  n+uriMjT  (18) 

from  which  it  is  readily  verified  that 


M0,r)  =  e~  with 

di(9,  r)2  =  02  +r2  -  (0sin +  rcos&)2 

An  example  for  the  kernel  function  that  follows  from  the  <2(0 
depicted  in  figure  3  can  be  see  in  figure  4.  It  is  worth  mentioning 
that  for  q  =  2,  £i  =  0  and  £2  =  tt/2  the  function  (f>a{9,r ) 
becomes  the  exponential  kernel  introduced  by  Choi  and  Williams 
in  [2], 

We  can  now  construct  an  adaptive  chirp  time-frequency  distri¬ 
bution  via: 


C„{t,w)  =  -&s  JJ  Axx{0,T)M9,T)e-j9t-jTUdTd9  (13) 

The  desired  positive  semidefinite  estimate  for  the  local  autocorre¬ 
lation  function  Rxx  ( t ,  r)  follows  by  projecting  Cxx  (f ,  w)  onto  the 
set  of  non-negative  functions  in  (t,ui)  from: 


R.11H  =  ^2  xi  X*M  xf  N  +ffz  1  (I9) 

t=l 


with  I  denoting  the  identity  matrix.  An  eigenvector  and  eigen¬ 
value  decomposition  with  Rla: [n]  Vj[n]  =  A;[n]  \i[n]  yields 
M  +  1  eigenvectors  v;[n]  and  M  +  1  non-negative  real  eigen¬ 
values  Ai[n]  >  A2[n]  >  ...  >  Am+iH-  It  is  readily  verified 
that  the  eigenvectors  Vj  [n]  for  i  =  1 . . .  p  span  the  same  subspace 
as  the  signal  component  vectors  x<[rc]  for  i  =  1  ...p.  The  re¬ 
maining  eigenvectors  Vj[n]  for  i  =  p  -I- 1 . . .  M  +  1  span  a  space 
that  is  orthogonal  to  the  signal  component  space  (see  [4]).  As  a 
consequence  we  can  construct  a  time-varying  MUSIC  estimator 
Pxx{n,u)  via: 


Pxx{n,ui)  = 


Em  ■ 

i=p+ 1  I 

with  e(w)  =  [  1  e ^  e 


M+1  'e"(w)  Vi[n]|2 


JMu  I T 


(20) 

(21) 


RXx(t,r)  =  |  J [Cxx(t,w)  +  \Cxx(t,u)\}eiTUdw  (14) 

3.  CHIRP  MUSIC 

We  can  now  use  Rxx{t,r)  instead  of  the  local  autocorrelation 
given  by  equation  (5)  to  estimate  the  frequency  contours  fi(t). 
It  is  beneficial  for  the  presentation  of  the  material  in  this  section 
to  switch  from  a  continuous  time  description  of  the  procedure  to 
a  discrete  time  description.  A  proper  discretization  of  the  proce¬ 
dures  presented  in  the  previous  section  is  possible,  but  has  to  be 
omitted  due  to  space  limitations. 


It  can  be  shown  that  the  locations  (n,2nfi[n])  of  the  peaks  in 
Pxx  (n,  u)  are  precisely  the  desired  values  of  the  frequency  con¬ 
tour  fi[n]  =  am  +  uh.  In  a  practical  application  the  true  autocor¬ 
relation  matrix  Rxx[n]  is  replaced  with  an  estimate  obtained  from 
a  proper  discretization  of  Rxx{t,  r)  from  equation  (14).  The  next 
section  provides  an  example. 

4.  EXAMPLE 

The  example  that  is  considered  in  this  section  is  a  discrete  sig¬ 
nal  which  consists  of  four  chirps  buried  in  complex  white 
Gaussian  noise.  The  first  chirp  starts  at  normalized  frequency 
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f.i  =  —0.04  and  chirps  up  to  fe  1  =  0.41.  The  second  chirp 
is  closely  spaced  to  the  first  one  from  /s2  =  -0.06  to  /e2  =  0.39. 
The  third  chirp  starts  at  fs3  =  0.1  and  ends  at  fe i  =  -0.45. 
Lastly,  the  fourth  component  is  a  stationary  complex  exponential 
with  frequency  f4  =  -0.265.  Note  that  the  signal  has  four  com¬ 
ponents  but  only  three  different  chirp  rates.  The  signal-to-noise 
ratio  for  the  presented  case  is  SNR  =  3[rfB],  The  signal  is  256 
samples  long. 


Spectrogram  of  Signal  x[n] 


time  sample  [n] 

Figure  1.  A  spectrogram  of  the  example  signal  x[n],  The 
employed  window  is  a  Hanning  window  with  length  31.  The 
window  length  wav  chosen  such  that  one  obtains  an  optimal 
tradeoff  between  the  time  and  the  frequency  resolution. 


Wigner  Distribution  of  x[n] 


time  sample  [n] 

Figure  2.  A  Wigner  distribution  of  the  example  signal  x[n]. 
The  four  chirp  components  are  clearly  resolved.  However, 
the  representation  is  strongly  distorted  with  cross-terms. 

Figure  1  shows  a  spectrogram  of  the  example  signal.  Two  of 
the  underlying  four  chirp  components  of  the  signal  are  reasonably 
well  represented.  The  remaining  two  chirps  however  cannot  be 
resolved  since  they  are  lying  too  closely  spaced  in  time  and  fre¬ 
quency.  This  inherent  limitation  of  the  spectrogram  carries  over 
to  any  other  estimation  method  that  is  based  on  a  sliding  window 
technique. 

Figure  2  shows  a  Wigner  distribution  of  the  example  signal. 
Even  though  the  four  chirps  are  clearly  resolved  it  is  still  very  diffi¬ 
cult  to  use  the  Wigner  distribution  for  frequency  tracking  purposes. 
Large  cross-term  peaks  obscure  the  true  location  of  the  auto-terms. 


Radial  Integral  Q(^) 


Figure  3.  The  radial  integral  Q(f)  that  results  from  the  ex¬ 
ample  signal  x[n].  The  three  spikes  that  correspond  to  the 
three  different  chirp  rates  in  x[n]  are  clearly  visible. 


Adaptive  Chirp  Kernel  <|>  (0,x) 

3 


Figure  4.  The  center  segment  of  the  adaptive  kernel  <j>a  (0,  r) 
that  was  obtained  from  the  example  signal  x[n].  The  three 
intersecting  ridges  are  a  consequence  of  the  three  different 
chirp  rates  in  x[n]. 


' - 1 - 1 - i - - - u 

0  50  100  150  200  250 


time  sample  [n] 

Figure  5.  The  adaptive  time-varying  MUSIC  estimate 
Pxx{n,ui)  from  signal  x[n}.  All  four  components  are  well 
resolved.  The  few  visible  misclassifications  are  due  to  the 
high  noise  content  in  the  signal. 
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Figure  3  shows  the  radial  integral  Q{f)  from  Axx(0,  r)  for  the 
example  signal  x{n}.  The  three  significant  spikes  in  Q(f)  corre¬ 
spond  to  the  three  different  chirp  rates  in  the  signal.  Figure  4  dis¬ 
plays  the  center  segment  of  the  adaptive  kernel  function  (j>a(6 ,  r) 
that  followed  from  the  given  Q(£).  In  figure  5  we  can  see  the  re¬ 
sulting  chirp  MUSIC  estimate  Pxx  (n,  u>).  All  four  chirps  are  well 
resolved  despite  the  low  signal-to-noise  ratio  in  the  given  case. 


Frequency  Tracking  Results 


time  sample  [n] 

Figure  6.  A  simulation  result  for  the  general  frequency  track¬ 
ing  procedure  described  by  equations  (22)  and  (23).  The  un¬ 
derlying  signal  j/[n]  consisted  of  three  sinusoidal  frequency 
modulations  buried  in  complex  white  Gaussian  noise.  The 
signal  length  was  1280  samples.  The  sliding  window  h[n] 
was  a  Hanning  window  with  length  129.  The  signal-to-noise 
ratio  was  Z[dB\.  Pyv(m,u)  was  evaluated  every  10  samples 
in  time. 

Operation  Characteristic 


Figure  7.  Error  probability  versus  signal-to-noise  ratio  for 
the  proposed  algorithm.  An  error  is  defined  as  an  event  in 
which  the  estimated  instantaneous  frequency  fi[n]for  com¬ 
ponent  number  i  at  time  n  is  not  equal  to  (or  within  the  tol¬ 
erance  range  of  the  spectral  sampling  of)  the  true  underly¬ 
ing  instantaneous  frequency  for  that  particular  component 
at  that  particular  time. 


5.  GENERAL  FREQUENCY  TRACKING 

In  the  previous  sections  we  have  considered  signals  that  were  com¬ 
posed  of  linear  chirp  components  only.  It  is  possible  to  extend  the 
proposed  method  to  signals  y[n]  with  arbitrary  frequency  contours 
if  the  individual  components  of  the  signal  can  be  approximated 
locally  with  chirps.  This  is  generally  true  for  signals  that  have 
frequency  contours  with  a  small  curvature.  We  can  use  a  symmet¬ 
ric  window  h[n]  =  h[-n]  with  finite  support  (h[n]  =  0  for  all 
|n|  >  N  for  some  N )  to  isolate  the  signal  part  that  we  want  to 
approximate  locally.  We  can  track  arbitrary  frequency  contours 
by  using  a  sliding  window  technique  according  to  the  following 
equations: 

Xm[n]  =  y[n  -  m]  ■  h[n]  (22) 

The  resulting  estimate  Pyy(m,  u>)  is  obtained  from: 

Pyy{m,Ut)  =  PxmXrn  (0>W)  (23) 

A  simulation  example  for  the  proposed  method  is  given  in  figure  6. 

Figure  7  displays  the  error  probability  of  the  presented  esti¬ 
mation  algorithm.  The  graph  resulted  from  a  large  number  of  sim¬ 
ulations  run  with  different  frequency  profiles  and  different  noise 
levels.  It  is  clearly  visible  that  the  method  still  performs  well  for 
low  signal-to-noise  ratios. 

6.  CONCLUSIONS 

The  introduced  new  class  of  adaptive  high  spectral  resolution  time- 
frequency  distribution  kernels  improves  and  generalizes  the  dis¬ 
tributions  proposed  in  [1].  The  signal  dependent  kernel  is  con¬ 
structed  with  respect  to  optimal  performance  for  signals  that  are 
composed  of  linear  chirps  in  complex  white  Gaussian  noise.  Ad¬ 
ditionally,  the  presented  time- varying  multiple  signal  classification 
(MUSIC)  method  is  used  to  obtain  a  separation  between  the  signal 
subspace  the  and  noise  subspace.  The  provided  frequency  tracking 
simulations  show  that  we  obtain  excellent  estimation  results  even 
for  closely  spaced  and  rapidly  changing  transients.  The  proposed 
method  delivers  good  high  resolution  estimates  even  if  the  under¬ 
lying  signal  is  composed  of  non-linear  frequency  contours  with 
a  low  curvature.  Furthermore,  it  is  shown  by  simulations  that  the 
proposed  method  operates  well  in  low  signal-to-noise  ratio  cases. 
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ABSTRACT 


2.  MODEL  DESCRIPTION 


Blind  identification  of  FIR  multiuser  channels  using  higher- 
order  statistics  (HOS)  is  investigated  in  this  work.  Higher 
order  statistical  information  is  exploited  to  identify  systems 
with  equal  number  of  users  and  outputs.  A  sufficiency  con¬ 
dition  is  presented  and  is  less  stringent  than  many  known 
conditions.  Furthermore,  a  new  blind  identification  method 
is  developed  which  extends  the  second  order  subspace  ap¬ 
proach  to  HOS  with  closed-form  solutions.  This  algorithm 
is  shown  to  be  capable  of  identifying  a  wider  class  of  chan¬ 
nels  and  robust  to  some  channel  order  over-estimation. 


1.  INTRODUCTION 

Multiple-Input  Multiple-Output  (MIMO)  models  arise  in 
a  wide  range  of  applications  such  as  the  CDMA  multiple 
access  communications  systems.  There  have  been  a  number 
of  research  works  dedicated  to  the  blind  identification  of 
FIR-MIMO  systems  using  second-order  statistics  (SOS)  [1]- 
[3].  It  should  be  noted  that  SOS  methods  are  restricted 
to  systems  satisfying  rather  stringent  conditions  requiring 
more  available  outputs  than  inputs.  In  fact,  even  some 
HOS  methods[4]  require  such  conditions.  Nevertheless,  a 
notable  advantage  of  HOS  methods  is  that  their  proper  use 
permits  identifiability  of  a  wider  class  of  channels  that  may 
have  equal  number  of  outputs  and  users.  Our  study  here 
focuses  on  this  particular  type  of  systems.  Related  works 
include  [5],  where  blind  identification  and  source  separation 
conditions  for  MIMO  system  driven  by  colored  inputs  are 
explored.  In  [6],  higher-order  cumulant  matching  is  used 
via  non-linear  optimization. 

In  this  paper,  we  generalize  our  previous  work  [7]  and 
propose  a  cumulant  matrix  subspace  algorithm  for  multi¬ 
user  systems  with  equal  number  of  output  and  input  signals. 
Unknown  channel  information  is  extracted  from  nullspace 
decomposition  of  several  cumulant  matrices  which  contain 
cross-sections  of  m-th  order  output  cumulants.  MIMO  ma¬ 
trix  impulse  response  can  be  identified  up  to  a  non-singular 
matrix.  This  linear  HOS  method  is  simple  and  admits  a 
closed-form  solution.  A  less  stringent,  sufficient  identifiabil¬ 
ity  condition  is  determined  for  this  method.  Exact  knowl¬ 
edge  of  channel  order  is  not  required  and  only  an  upper 
bound  is  needed. 


Supported  by  NSF  grant  CCR-9996206. 


Given  a  discrete  Al-input/p-output  FIR-MIMO  system,  the 
th  channel  output  signal  x;,n  is  given  by 

N  q 

xi,n  —  ^  ^  Sn  —  k,u  fii.u  (&)  W itn  i  —  1,  *  *  '  ,  P  (2.1) 

tt— 1  Jfe= 0 


Mutually  independent  input  sequences  {sn,u}  are  i.i.d.  non- 
Gaussian  stationary  processes  with  zero  mean.  hij(n)  is 
the  impulse  response  from  input  j  to  output  i.  The  max¬ 
imum  time  span  of  hij(n)  is  q  +  1.  Noises  ru,-,n  are  zero- 
mean  stationary  Gaussian  processes  and  are  independent 
of  {sn)U}-.  Let  Xn  —  [xj,n  ‘  *  *  2-p.n]  ,  &n  =  [s?i,l  *  *  *  Sn.jv]  , 
and  wn  =  [wi,n  •  •  ■  u»p,n]T-  It  then  follows  that  x„  = 
YH=o  HkSn-k  +  wn,  where  the  px  N  channel  response  ma¬ 
trix  Hn  =  [/i>,j(n)]. 

Define  a  vector  x[n ]  =  [x„,  •  •  • ,  x^_L)T .  The  linear  sys¬ 
tem  can  be  described  by 


x[n]  =  Hs[n\  +  tu[n],  (2.2) 


where  s[n ]  =  [s£,  •  •  • ,  w[n ]  =  [ w l,  •  •  ■ ,  u£_l]T, 

and  the  convolution  matrix  H  is  a  (L  +  l)p  x  (L  +  q  +  1)N 


block  Toeplitz 

matrix 

'  Ho 

Hi 

■■■  Hq 

0  • 

•  •  0  - 

H  = 

0 

H0 

Hi  ■■■ 

Hq 

'•  0 

•  (2-3) 

0 

0  Ho 

Hi  ■ 

••  Hq  \ 

Denote  cum(x  1,  •  • 

j  Xm  )  RS 

m-th  order  joint  cumulant 

cum(xi1'(n_ni),  ■  •  •  ,  nm)) 

hiltu(i  »m)  *  *  *  ^4mm)-, (2.4) 

where  7 m,„  is  the  m-th  order  kurtosis  of  sn,u  ■  We  consider 
the  case  with  symmetric  signal  sn,u  and  non-zero  7m,„  for 
even  m.  Define  a  cumulant  matrix  containing  cross-sections 
of  m-th  order  output  signal  cumulant. 


k ^  =  cum(x[n],  x[n]H 


xl,(n  —  k),  xl,(n  —  k),  *  ‘  ‘  )  xl,(n  —  k )) 

v - v - ' 

m  — 2  even 

(2.5) 
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I  From  (2.2),  we  find  that 

C&k)  =  H  A%k)  Hh  (2.6) 

A^'  ^  =  diag(0,  ■  ■  •  ,0,  Di,0:  •  •  ■  i  Di,q,  0,  •  -  ,  0  )  (2.7) 

fc  blocks  ( L—k )  blocks 

Di  j  =  diag(jm,i\hi,i(j)\m~2 ,  - '  ‘  ,lm,N  \hi,N(j)\m  2)- 

(2.8) 

Notice  that  A&k)  is  block  diagonal,  determined  by  Dtj 
(j  =  0, 1,  •  •  • ,  q),  each  of  which  is  an  N  x  N  diagonal  matrix. 
Observe  that  cii’k)  and  H  have  well-defined  structures. 
Our  algorithm  take  advantage  of  such  a  prior  structural 
knowledge  in  parameter  estimation.  If  we  choose  k  =  q  and 
L  >  k  +  q,  (2.6)  can  be  rewritten  as 


Matrix  A(z)  is  the  Smith  Form  of  H(z),  which  is  uniquely 
determined  by  monic  polynomials  (A;(z)}  (i  =  1,  •  •  • ,  r).  r 
is  the  (normal)  rank  of  H(z)  ([9],  p.390,  373).  If  there  exists 
a  non-zero  z0  such  that  H(z0)  is  full  column  rank,  then  r  = 
N.  {A,(z)}  obey  a  division  property.  A;(z)  divides  A;+i(z) 
for  i  =  1,  •  •  • ,  r  <*1.  A;(z)  is  determined  by  Ai(z)  = 

(A0(z)  =  1),  A;(z)  is  the  greatest  common  divisor  of  all 
i  x  i  minors  of  H (z).  From  the  division  property,  we  can 
prove  that  (A^z)}™,]  have  at  most  qN  distinct  zeros.  This 
means  that  A(z)  is  rank  deficient  at  finitely  many  points. 
Thus,  one  can  select  d  +  1  different  zs,  such  that  A(zs)  is 
full  column  rank.  Since  a  unimodular  matrix  is  nonsingular 
for  all  z,  we  have 

A{zt)W~1{z,)v{z,)  =  0  <=!>  u(zs)  =  0  (3.3) 


C£k)  =  Ha  Hf,  (2.9) 

where  H,  is  a  (L  +  l)p  x  (L  +  1  <$q)N  block  Toeplitz  matrix 
Hq  0  •  •  •  0 

:  Hq  : 

Ho  0 

0  Ho  '  •  Hq 

0  0  Ho 

E(i'fc)  =  diag(0,---,0,  Di,oy,Di,q,  0,--,0  )  (2.11) 

k-q  blocks  L-k-q  blocks 

We  define  the  (q  +  l)px  N  channel  parameter  matrix  U  as 
H  =  [Hq  ,  ■■■ ,  Hq]t.  Channel  transfer  function  is  given  by 
H(z)  =  T,l=oHkZ~k- 

3.  IDENTIFIABILITY  CONDITIONS 

Most  existing  SOS  blind  identification  algorithms  require 
the  convolutional  matrix  H  to  be  full  column  rank.  Conse¬ 
quently,  it  is  required  that  (a)  H{z)  be  full  column  rank  for 
all  non-zero  z  (i.e.  H(z)  is  irreducible);  (b)  p>  N. 

In  our  HOS  approach,  the  full  column  rank  of  Hs  is 
necessary  in  order  to  extract  nullspace  of  Hff  from  C^k> . 
We  first  establish  a  sufficient  condition  for  Hs  to  have  full 
column  rank. 

Theorem  3.1  Consider  Hs  to  be  a  (d+q  +  l)px  (d  +  l)N 
block  Toeplitz  matrix  given  by  (2.10)  with  d  >  0.  Ha  is  full 
column  rank  if  there  exists  a  non-zero  z0  €  C  (including 
oo),  such  that  H(zo)  has  full  column  rank. 

Proof:  It  is  clear  that  p  >  N  is  an  implicit  assumption  for 
H(z0)  to  be  full  rank.  Define  vector  v  =  [uj,  wfj  , 

where  Vi  are  N  X  1  vectors,  v(z)  =  z~' '  Then 

Hsv  =  0  <=>  H(z)  v(z)  =  0  for  all  z  (3.1) 

For  any  p  x  N  polynomial  matrix  H(z),  we  can  find  uni¬ 
modular  matrices  {U(z),W(z)},  such  that 

U(z)  H(z)  W(z)  =  A(z)  (3.2) 


for  all  zs.  Write  this  relation  as  a  set  of  linear  equations 
and  we  can  show  that  v  =  0.  In  other  words,  Hs  v  =  0 
implies  that  iJ  =  0.  Hs  is  thus  full  column  rank.  ■ 

In  order  for  Hs  to  be  full  rank,  it  is  certainly  necessary 
that  H  also  has  full  rank.  If  an  FIR-MIMO  channel  satisfies 
this  sufficient  condition,  H  can  be  uniquely  identified  up  to 
an  ambiguity  matrix  Q  as  stated  in  Theorem  3.2. 

Theorem  3.2  Suppose  there  exists  a  non-zero  zo  (includ¬ 
ing  oo)  such  that  H(zq)  has  full  column  rank.  Let  Ga  be 
a  block  Toeplitz  matrix  with  the  same  structure  and  dimen¬ 
sion  as  Ha.  G  is  the  channel  parameter  matrix  of  Gs  ■  Then 
G  is  full  column  rank  and  Range(G s)  C  Range(Hs)  iff 
Gs  =  Hs  A,  where  A  is  a  block  diagonal  matrix  given  by 
A  =  diag(Q, ■  •  •  ,Q)  with  Q  to  be  an  N  x  N  non-singular 
matrix. 

Proof:  The  sufficient  part  is  rather  obvious.  Now  we  con¬ 
sider  the  necessary  part.  If  Range(Ga)  C  Range(H„),  then 
there  exists  a  square  matrix  A  such  that  Gs  =  Hs  A.  A  — 
[Aij\  (i  =  d,  ■  ■  ■ ,  0)  (j  =  0,  •  •  •  ,d),  where  Aij  is  NxN  square 

matrix.  Define  Aj(z)  —  Xo=o  A^z-1,  G(z)  =  J^i=o^iZ  ’• 
Taking  the  advantage  of  the  “shift  invariant”  feature  of 
Toeplitz  matrices  Gs  and  II s ,  the  relation  Gs  =  Hs  A  is  con¬ 
veniently  rewritten  as  polynomial  matrix  equations  H(z)  ■ 
Aj(z)  =  G(z )  ■  for  all  z  and  j  =  0,  •  ■  • ,  d.  Equiva¬ 

lently,  we  have 

H(z)  ■  [Aj(z)  ■  zd-J  ■^Ad(z)]  =  Opxiv  (3.4) 

for  j  <  d.  Let  E  denote  the  set  of  points  where  H(z)  is 
full  rank.  By  our  assumption,  E  includes  all  non-zero  com¬ 
plex  numbers  except  at  most  qN  distinct  elements.  Thus, 
Aj(z)<^Ad(z)-z~<-d~j)  =  0 pxjv  for  all  z  e  E  and  j  <  d.  Nat¬ 
urally,  this  relation  can  be  reduced  to  a  set  of  polynomial 
matrix  equations 

y  Akjz~k + y {A(d.-j+k)j  &Akd)z (d  j+k'> 

k= 0  k= 0 

d 

+  y  (&Akd)z-(d-j+k)  =0,  j<  d.  (3.5) 

k=j+ 1 

These  polynomial  equations  hold  true  for  all  z  &  E  if  and 
only  if  their  coefficients  are  identically  zero.  Based  on  this 
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result,  we  summarize  the  relationship  between  Akj  and  Akd 
as  follows:  (1)  Akj  =  0,  for  k  =  0, 1,  •  •  • ,  (d  j  1), 
(2)  A(d-j+k)j  =  Akd,  for  k  =  0,1,  (3)  = 

A(j+2)d  =  •  ■  •  =  Add  =  0.  Applying  this  relation  for 
j  -  0  •  •  • ,  d&\,  it  is  easy  to  verify  that  A  is  a  block  diagonal 
matrix  with  same  diagonal  blocks  A(d-j)j  (j  =  0,  •  -  • ,  d).  A 
can  be  written  as  A  =  diag(Q,  ■  ■  ■ ,  Q)  with  Q  to  be  N  x  N 
square  matrix.  Thus,  G  =  HQ.  Since  G  and  H  have  full 
column  rank,  Q  must  be  non-singular.  ■ 

4.  MIMO  CUMULANT  SUBSPACE 
(MCS)  ALGORITHM 

Theorem  3.2  establishes  the  fundamentals  of  the  MCS  al¬ 
gorithm.  If  the  nullspace  of  H ?  is  estimated,  Ha  can  be 
uniquely  identified  by  solving  linear  equation  (4.1). 

N?Hs=0,  where  Ns  =  Null(H f)  (4.1) 

H,  is  a  block  Toeplitz  matrix  with  the  same  structure  and 
dimension  as  Hs. 

Our  approach  needs  to  extract  Na  from  cumulant  ma- 

•  (l 

trix  Cm  ■  In  order  for  this  subspace  method  to  work,  H„ 
must  be  full  rank,  or  as  we  have  established:  (1)  p  >  N. 
(2)  H(zo)  is  full  column  rank  for  at  least  one  non-zero  zq. 
However,  equation  (2.11)  reveals  that  E((’  is  rank  defi¬ 
cient  when  subchannels  have  different  channel  order  or  some 
channel  coefficients  are  zero,  even  if  we  select  L  =  2 q.  In 
this  case,  Null(Cm'k *)  D  Ns.  Without  the  knowledge  of 
the  positions  of  zero  channel  coefficients  and  the  channel 
order  of  each  user,  it  is  quite  difficult  to  determine  the  di¬ 
mension  of  Null(Cm'k^)  and  to  make  assumptions  about 
the  relationship  between  Null(Cm’k ')  and  Ns. 

As  a  result,  we  propose  two  partial  nullspace  algorithms 
using  cumulant  matrices.  Denote  Hs(i)  to  be  the  subspace 
of  Hs  taking  the  first  i  block  columns.  In  these  two  meth¬ 
ods,  we  obtain  the  partial  nullspace  of  H,  ( 1)  (or 
from  the  nullspace  of  cumulant  matrices  to  generate  the 
estimate  of  channel  parameter  matrix  H .  Recall  that  our 
definition  of  Hs  in  Theorem  3.1  is  a  block  Toeplitz  matrix 
with  adjustable  size  of  (d  +  q  +  1  )p  x  (d  +  1)N  (d  >  0). 
Thus,  Theorem  3.1  and  3.2  can  be  adapted  to  current  cases 
without  modification. 

Here  we  assume  that  there  exists  at  least  one  non-zero 
element  in  each  column  of  Ho-  This  assumption  is  easily 
met  by  re-indexing  time  n  in  {s„,„}  for  different  sources.  It 
helps  a  lot  in  the  development  of  our  algorithms.  We  first 
present  our  algorithms  assuming  channel  order  q  is  exactly 
known  and  consider  channel  order  over-estimation  later. 


Ms 


(4.3) 


E(fc)  = 


£(p,*0 


Hs(q  +  1) 

0  Hs(q  +  1)  J 

,  S(I'k)  =  diag(D,t0,- ■  ■  ,£>;,,)  (4.4) 


Here  we  can  re-write  E(I’ by  removing  zero  blocks.  Ms  is 
full  column  rank  as  Hs(q  +  1)  is  full  rank.  Null(Cm(k ))  D 
Null(H ,  (g  +  1))  since  E (k)  is  possibly  rank  deficient.  How¬ 
ever,  under  the  non-zero  column  assumption  on  Ho,  the  first 
N  columns  of  matrix  E(fc)  are  full  rank  and  independent 
of  other  columns.  Thus,  Null(Cm(k))  must  be  orthogo¬ 
nal  to  Hs{  1).  Although  we  do  not  know  the  dimension  of 
Null(Cm(k)),  a  lower  bound  is  ds  =  (L+l)p^{q+l)N.  De¬ 
fine  Nc(  1)  to  be  the  subspace  of  N ull(Cm(k))  corresponding 
to  the  d3  smallest  eigenvalues  of  Cm(k).  It  follows  that 


NC(1)H  H3(l)  =  0  (4.5) 

Evidently,  Nc(l)  only  contains  partial  information  of  the 
entire  nullspace  since  Null{Ha  (1))  has  dimension  of  (L  + 
l)p  <*N. 

Before  moving  on,  we  propose  a  simplified  single-lag 
method  for  even  order  cumulant.  Instead  of  stacking  p  cu¬ 
mulant  matrices,  we  define 


p 

Sm(k)  =  J2  C&k)  =  Hs(q  +  1)  ,(  Jfc)  H?(q  +  1)  (4.6) 

1  =  1 


where  ,(  k)  =  £(l'k)-  Since  there  is  no  cancellation  in 

the  summation  of  entries  for  even  m,  the  first  N  columns 
of  ,(  k )  is  full  rank.  Thus,  we  follow  the  same  idea  pre¬ 
sented  above.  In  this  case,  relationship  (4.5)  and  the  value 
of  d3  remain  the  same  except  that  7VC(1)  is  estimated  from 
Null{Sm{k)). 

These  two  methods  are  labeled  as  SLPC  and  SLPS  re¬ 
spectively  based  on  the  use  of  Cm(k)  or  Sm(fc)  to  obtain 
the  partial  nullspace  7VC  ( 1 ) . 

H  is  the  solution  of  over-determined  linear  equation 
(4.5)  when  window  length  L  is  large  enough.  To  allow  the 
unique  solution,  the  first  (q+  l)p  rows  of  Nc(l)  need  to  have 
rank  of  no  less  than  (q  +  l)p-&N.  Such  requirement  can  not 
be  guaranteed  in  all  the  cases  when  only  single-lag  cumulant 
matrices  are  utilized.  By  introducing  multiple-lag  cumulant 
matrices,  we  are  able  to  collect  more  statistical  information 
and  reduce  the  probability  for  pathological  cases  in  which 
(4.5)  is  under-determined. 


4.1,  Single-lag  Partial  Nullspace  Method  (SLP) 

In  this  method,  we  use  p  cumulant  matrices  Cm  ^  with  a 
single  delay  lag  k  such  that  k  >  q  and  L  >  k  +  q.  Selecting 
k  =  q  without  loss  of  generality,  we  stack  these  cumulant 
matrices  together  and  obtain  their  common  nullspace.  De¬ 
fine 


=  M3Z(k)H?(q  +  l)  (4.2) 


Cm(k) 


Mp-  k) 


4.2.  Multiple-lag  Partial  Nullspace  Method  (MLP) 

We  stack  matrices  Cm(k)  with  multiple  delay  lags  k  = 
k\,  k\  +1,  •  •  • ,  k2  and  estimate  their  common  nullspace.  The 
choice  of  delay  lags  and  L  should  satisfy  conditions:  k\  =  q, 
>  ki,  L  =  &2  +  q-  Denote  K  to  be  the  number  of  differ¬ 
ent  delay  lags  used,  K  =  k2  <=>Aq  +  1.  Define 


Cm(kuk2)  = 


Cm(k r) 
Cm(k2) 


=  MH(ki,ka)Hf 


(4.7) 
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'Mm  0  ' 

M  =  (4.8) 

.0  Mm  . 

-  E(fci)  ■ 

S(fci,fc2)=  :  (4.9) 

.  S(fca)  . 

E(fc)  =  (  0—_0  Dq---D*  (L-T)  )  (4.10) 

A:  — q  blocks  K  -\-q—k— 1  blocks 

Mm  is  defined  by  replacing  #•,(<?  +  1)  with  Hs  in  (4.3). 
Re-write  E(ifc)  in  terms  of  its  block  columns,  where  Df 
(i  =  0,  contains  diagonal  matrices  {Dj.,  ;,•••,  Dp,;}. 

Since  Dq  is  full  rank  for  each  k,  it  is  easy  to  show  that  the 
first  KN  columns  of  matrix  E(fci,  fc2)  is  full  rank.  Then, 
Null(Cm(ki,  fc2))  is  orthogonal  to  Ha(K).  As  in  SLP  method, 
we  pick  a  subspace  NC(K)  of  Null(Cm(ki ,  fc2))  associated 
with  the  dm  smallest  eigenvalues,  with  dm  =  {L  +  \)p  O 
(K  +  q)  N.  So, 

Nc(K)hH3(K)=  0  (4.11) 

Again,  NC(K)  is  a  partial  nullspace  of  H? ( K ). 

To  obtain  the  estimate  of  H,  we  solve  linear  equation 
(4.11)  as  in  [1].  Define  vector  v  =  [uq  ,  u[,  •  •  • ,  uJ,]T ,  where 
u,’s  are  p  x  1  vectors.  Then, 

vhH3(K)  =  0  &  Hank(v)Hn  =  0  (4.12) 

UQ  Ul  ■■■  UK-1 

Hank(v)  \  (4.13) 

.  Uq  Uq+1  ■■■  UK+q- 1  . 

We  take  the  solution  that  minimizes  \\Hank(vi)H  H\\ %, 
subject  to  constraint  HhH  =  /.  ||  |[f  stands  for  the  Frobe- 
nius  norm.  Estimate  Ti  is  the  unit-norm  eigenvectors  asso¬ 
ciated  with  the  smallest  N  eigenvalues  of  matrix  G,  where 
G  =  Hank(vi)IIa.nk{vi)n .  This  also  guarantee  the 

full  column  rankness  of  T-L. 

Similar  derivation  also  applies  for  Sm(k).  We  label  them 
as  MLPC  and  MLPS,  respectively. 

Remarks: 

Note  that  our  nullspace  methods  could  encounter  patholog¬ 
ical  cases  when  partial  nullspace  lead  to  non-unique  solu¬ 
tions  of  linear  equations.  The  probability  of  such  occurence, 
however,  is  zero.  Our  approach  in  using  partial  subspace 
has  been  developed  for  second  order  statistics  in  [1]  and  was 
further  investigated  in  [10].  To  guarantee  unimodality,  we 
have  developed  a  method  to  retrieve  the  full  nullspace  Ns 
under  strong  conditions  that  channel  order  of  each  user  is 
known  and  matrix  Ho  has  enough  non-zero  rows,  we  will 
not  present  the  details  here  because  of  the  page  limit. 

4.3.  Channel  Order  Overestimation 

In  practice,  true  channel  order  q0  is  usually  unknown  and 
we  have  the  knowledge  of  its  upper  bound  q,  q  >  q0.  Both 
Theorem  3.1  and  Theorem  3.2  still  hold  true  in  this  situ¬ 
ation.  Nc(  1)  and  NC(K)  obtained  have  smaller  dimension. 
The  Toeplitz  structure  of  Hs{  1)  and  Ha(K)  is  maintained 
though  Hqo+i,  •••,  Hq  are  zero  matrices.  However,  MCS 


algorithms  still  operate  properly  with  paramater  q.  The 
estimation  performance  will  be  degraded  because  generally 
Hqo+ 1,  ■  ■  ■,  Hq  are  non-zero.  However,  the  MCS  algorithms 
will  not  “collapse”  when  channel  order  is  over-estimated. 


5.  SIMULATION  EXAMPLES 

Now  we  present  simulation  results  to  illustrate  the  perfor¬ 
mance  of  our  algorithms.  The  channel  inputs  are  two  mu¬ 
tually  independent  i.i.d.  QPSK  signals.  Mutually  inde¬ 
pendent  zero-mean  complex  white  Gaussian  noise  is  added 
to  each  output.  Noisy  output  signals  have  the  same  SNR. 
We  use  MLPS  method  to  estimate  the  unknown  parame¬ 
ters  and  choose  fci  =  q,  fc2  =  2 q  and  L  =  3 q.  Only  4-th 
order  cumulants  are  used.  The  performance  is  measured  by 
the  Overall  Normalized  Mean  Square  Error  (ONMSE).  It 
is  obtained  by  averaging  NMSE  of  all  subchannels.  Results 
are  averaged  over  200  Monte  Carlo  runs. 


NMSEn  = 


a  ELo  I hij(n)  &hjj(n) |2 


ELolM”)!2 


ONMSE  =  NMSEij  (5.2) 

^  i  =  l  j  =  1 

Example  1.  We  consider  a  2-input/2-output  system  with 
transfer  function  H{z)  as 

’  1  +  0.52-1  <»0.52~2  -t**-3  1.6  •t^0.64z-1  +  0.388z-2  ' 

0.4  +  0.6a-1  <*z~2  0.7263z-1  -»0.9078 a-2 

(5.3) 

Since  Hn(z)  and  H2i(a)  have  a  common  zero  z0  —  1,  H(z) 
is  not  irreducible.  But  H(z)  satisfies  the  identifiability  con¬ 
ditions  of  our  algorithms.  In  Figure  1,  we  show  the  per¬ 
formance  of  MLPS  method  at  different  SNR  levels.  As 
SNR  increases,  performance  improvement  is  evident  if  more 
data  samples  are  used  in  estimation.  This  example  verifies 
the  identifiability  condition  stated  in  Theorem  3.1.  It  also 
demonstrates  the  performance  of  the  cumulant  algorithms 
for  ill-conditioned  channels  even  with  the  partial  knowledge 
of  nullspace. 

Example  2.  In  this  example,  we  consider  a  case  when  chan¬ 
nel  order  is  over-estimated.  The  channel  is  a  2-input/2- 
output  system  with  H(z)  given  by 


’  0.7  +  a-1  +  0.7z-2  1.4  <^1.82z-1  +  0.6593a-2 

2.7<»0.8  a-2  0.5  +  1.2a-1  +  0.7426a-2 

(5.4) 

Figure  2  shows  the  performance  of  MLPS  method  when 
channel  order  is  exactly  known.  Compared  with  example  1, 
reliable  estimates  are  generated  with  much  smaller  number 
of  data  samples  for  this  ordinary  channel.  In  Figure  3,  we 
show  the  results  when  channel  order  q  is  over-estimated  by 
1  and  2  for  6400  data  samples.  Clearly,  the  performance 
degradation  is  mild  and  non-abrupt  as  the  order  difference 
q<$q0  increases.  Under  channel  order  over-estimation,  these 
results  establish  the  robustness  of  the  MCS  algorithm. 
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6.  CONCLUSIONS 


In  this  paper,  a  new  linear  HOS  approach  for  blind  identifi¬ 
cation  of  FIR-MIMO  channels  is  presented.  Our  algorithms 
are  based  on  nullspace  decomposition  of  multiple  cumulant 
matrices.  A  sufficient  identifiability  condition  for  this  ap¬ 
proach  is  derived.  These  partial  nullspace  algorithms  are 
capable  of  identifying  a  wide  class  of  channels  including 
ones  not  identifiable  due  to  ill-conditioning  for  some  exist¬ 
ing  methods.  Finally,  our  approach  is  less  sensitive  and 
more  robust  to  channel  order  over-estimation. 
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Figure  1:  Example  1:  Performance  of  MLPS  at  different 
SNR  levels 


Figure  2:  Example  2:  Performance  of  MLPS  at  different 
SNR  levels 


Figure  3:  Example  2:  Performance  comparison  for  channel 
order  over-estimation  with  6400  samples 
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ABSTRACT 

This  paper  introduces  a  new  linear  algebraic  method  for 
blind  identification  of  a  nonminimum  phase  FIR  system. 
The  proposed  approach  relies  only  on  stationary  fourth  or¬ 
der  statistics  and  is  based  on  the  ’joint  diagonalization’  of 
a  set  of  fourth-order  cumulant  matrices.  Its  performance 
is  illustrated  via  some  numerical  examples.  Further  this 
method  turns  out  to  overcome  the  problem  of  having  some 
zero  taps  in  the  system  impulse  response. 

1.  INTRODUCTION 

The  blind  identification  problem  of  linear  system  is  of  great 
interest  in  diverse  fields  including  speech  processing,  data 
communication,  and  geophysical  or  biological  data  process¬ 
ing  due  to  its  relevance  to  blind  deconvolution.  The  general 
purpose  of  blind  identification  is  to  identify  an  unknown  sys¬ 
tem  driven  by  unobservable  input  based  only  on  its  output. 
This  paper  deals  with  a  particular  case  which  is  the  identi¬ 
fication  of  a  finite  impulse  response  (FIR)  system  driven  by 
a  non-Gaussian  white  input.  It  is  well  known  that  second- 
order  statistics  are  sufficient  to  identify  any  minimum  phase 
system  [1]  and  that  a  nonminimum  phase  system  cannot  be 
uniquely  identified  using  only  second-order  statistics.  On 
the  other  hand,  it  is  shown  for  a  non-Gaussian  input  that 
consistent  estimates  of  the  parameters  of  any  FIR  system 
can  be  obtained  by  using  higher-order  statistics  or  cumu- 
lants  of  the  observed  data  [2],  This  is  because  the  higher- 
order  statistics  preserve  the  phase  characteristics  (up  to  a 
linear  phase  shift),  unlike  the  second-order  statistics. 

Several  solutions  exist  for  the  identification  of  the  non¬ 
minimum  phase  FIR  system  using  high-order  statistics.  Some 
of  them  are  closed- form  solutions  [3],  others  are  linear  al¬ 
gebraic  methods  [4],  or  nonlinear  optimization  approaches 
[5],  The  linear  algebraic  approaches  do  not  perform  as  well 
as  the  nonlinear  optimization  methods  but  they  are  compu¬ 
tationally  attractive  and  can  be  used  as  good  initial  guess 
for  the  nonlinear  optimization  methods  that  usually  suffer 
from  local  minima  and  ill  convergence. 

Recently,  a  new  algebra  tool  referred  to  as  ’Joint  Di¬ 
agonalization’  has  been  successfully  applied  to  some  sig¬ 
nal  processing  problems  such  as  blind  source  separation 
[6,  8,  9],  blind  identification  of  linear-quadratic  models  [10] 
and  source  localization  [11,  12,  13].  Herein,  our  aim  is  to 
apply  this  tool  to  the  blind  identification  of  FIR  systems. 


Hence,  we  propose  a  new  linear  algebraic  approach  based  on 
a  joint-diagonalization  of  a  set  of  fourth-order  cumulant  ma¬ 
trices.  The  recovery  (identification)  of  the  system  impulse 
response  from  this  process  is  made  possible  by  the  existing 
relationships  between  its  taps  and  those  cumulants.  More¬ 
over,  the  identification  procedure  described  herein  utilizes 
the  sample  fourth-order  cumulant  matrices  which  are  esti¬ 
mated  from  a  finite  sequence  of  the  (possibly  noisy)  output 
signal. 

2.  PROBLEM  FORMULATION 

Consider  a  linear  time  invariant  (LTI)  FIR  system  described 
by 

y(n)  =  ]P  h(i)s(n  -  i)  +  u(n)  (1) 

i=o 

where 

y[n)  output  sequence, 

h(n)  impulse  response  of  the  FIR  LTI  system 
that  is  allowed  to  be  nonminimum  phase, 
s(n)  input  to  the  system, 
u(n)  additive  noise. 

The  assumptions  made  about  the  data  model  are  as 
follows. 

Al)  The  input  process  {s(n)}  is  an  i.i.d.  zero-mean  non- 
Gaussian  stationary  process. 

A2)  The  noise  process  {u(n)}  is  a  Gaussian,  perhaps  col¬ 
ored,  zero-mean  stationary  process  independent  of 

{»(”)>• 

A3)  h(n)  =0,  n  <  0,  h(n)  =  0,  n  >  q,  h(q)  ±  0,  and 
h(0)  =  1,  which  fixes  the  inherent  scale  ambiguity. 

Our  task  in  this  paper  is  to  identify  the  parameters  of  the 
FIR  system  (h(rc),  n  =  1,  •  •  • ,  q)  using  the  fourth  order  cu¬ 
mulants  of  the  measured  output  process  y(n)  and  assuming 
that  the  FIR  system  order  q  is  known.  Our  motivation  in 
using  the  fourth  order  cumulants  for  the  blind  identifica¬ 
tion  problem  under  assumptions  Al)  and  A2)  comes  from 
the  fact  that  Gaussian  processes  have  identically  zero  cumu¬ 
lants  of  any  order  greater  than  two.  Moreover,  third-order 
cumulants  vanish  even  for  non-Gaussian  random  processes 
with  symmetric  distributions,  but  fourth  order  cumulants 
generally  do  not  for  practically  useful  random  processes. 
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Hence,  fourth  order  cumulants  have  two  advantages;  one  is 
that  the  additive  Gaussian  noises  do  not  affect  4-th  order 
cumulants,  and  the  other  is  that  4-th  order  cumulants  can 
be  used  in  many  more  situations  than  third  order  cumu¬ 
lants. 


3.  THE  FOURTH-ORDER  CUMULANT 
MATRICES 

Let’s  start  by  giving  the  definition  of  the  fourth  order  cu¬ 
mulants  of  a  stationary  random  process. 

Definition  1  Let  y(n)  be  a  stationary  random  process. 
Then,  the  fourth-order  cumulant  of  y(n)  cy(ki,k2,k3)  for 
fixed  integers  ki,  k2  and  k3  is  defined  by 

cv(kuk2,k3)  =  cum[y(n),  y(n  +  ki),  y{n  +  k2),  y(n  +  fc3)] 

In  particular,  when  y(n)  is  zero-mean,  we  have  the  following 
expression, 


where  the  superscript  T  designs  the  transpose, 


H  = 


1  [M0)\/W] 

h{q  -  1)  [h(0)J^ 

Hi)  [Mo)vUl] 


i  [fcWVTrTD 

h{q-  1)  [%)VR] 

Hi)  IAWn/N] 


o 

0 


(6) 

and  A  (fc)  =  diag[Up-,  M$l,  •  •  • ,  ^],  for  k  =  -,,■■■,  q. 
Note  that  H  is  a  (2 q  +  1)  x  m  matrix  and  A  (fc)  is  an 
m  x  m  matrix,  where  m  is  equal  to  the  number  of  non¬ 
zero  FIR  taps.  According  to  assumption  A3),  we  do  have: 
2  <  m  <  g  +  1. 


Next,  we  propose  a  new  procedure  for  the  estimation  of  the 
FIR  system  parameters  (h(n),  0  <  n  <  q)  that  exploits  the 
nice  structure  (5)  of  the  4-th  order  cumulant  matrices. 


cy(kuk2,k3)  =  E[y(n)y(n  +  ki)y(n  +  k2)y(n  +  k3)] 
-E[y(n)y(n  +  fci)]£[y(n  +  k2)y(n  +  k3 )] 
~E[y(n)y(n  +  k2)]E[y(n  +  ki)y(n  +  fc3)] 
-E[y(n)y(n  +  fc3)]£[y(n  +  h)y(n  -(-  /c2)]  (2) 

Now,  we  define  the  fourth-order  cumulant  matrices. 


Definition  2  Let  y(n)  be  a  stationary  random  process. 
Ihen,  the  (2 q  +  1)  x  (2 q  +  1)  fourth-order  cumulant  matrix 
of  y(n)  Sy(k)  for  a  fixed  integer  k  is  defined  by 


Sy(k)  = 

Cy(-q,-q,k)  •••  Cy(-q,0,k) 


Cy(-q,q,k) 


q,  k)  ■  •  •  Cy(0,  0,  A;)  •  •  • 

.  cy{(L—q>k)  •••  Cy(q,0,k)  ■■■ 

where  cv(i,  j,  k),  -q  <  i,  j  <  q,  are  the  4-th 
of  y{n). 


cy(0,q,k) 


(3) 


oy(q,q,k)  J 
order  cumulants 


Next,  we  examine  the  relations  between  the  cumulants  of 
inputs  and  those  of  outputs.  According  to  (1)  and  assump¬ 
tions  Al)  and  A2),  the  4-th  order  cumulant  of  the  output 
system  is  given  by 


cy{h  j>  k)  =  7»  'y  ^  h(l  -p  i)h{l  -f-  j)h(l  +  k)  (4) 

(=0 


where  7,  =  c,(0, 0,  0).  In  the  above  relation,  we  have  taken 
into  account  the  fact  that  the  4-th  order  cumulant  of  the 
additive  Gaussian  noise  vanishs.  Replacing  (4)  into  (3)  and 
after  some  algebraic  manipulations,  the  4-th  order  cumulant 
matrices  show  to  have  the  following  structure: 

Sy(k)  =  sign(js)HA(k)HT  (5) 


4.  THE  PROPOSED  IDENTIFICATION 
APPROACH 

Orthonormalizing:  The  first  step  of  the  proposed  esti¬ 
mation  procedure  consists  of  orthonormalizing  the  fourth- 
order  cumulant  matrices.  This  is  achieved  using  an  or¬ 
thonormalizing  matrix  W,  i.e.  a  m  x  (2q  +  1)  matrix  such 
that  I  =  W[sign(y3)Sy(Q)]WT.  Replacing  the  expression 
of  sign(y3)Sy(0)  =  HA(0)HT  =  HHT  in  the  latter  expres¬ 
sion,  shows  that 

I  =  ( WH){WH)t  (7) 

so  that  W'Fisaroxm  unitary  matrix.  For  any  whitening 
matrix  W,  it  thus  exists  a  m  x  m  unitary  matrix  U  such 
that 

WH  —  U  or  H  =  W*U  (8) 

where  the  superscript  #  denotes  the  Moore-Penrose  pseu¬ 
doinverse:  W#  =  WT (WWT)-1 .  The  orthonormalizing 
matrix  W  can  be  determined  from  the  eigendecomposition 
of  the  fourth  order  cumulant  matrix  S9(0)  provided  that 
Sy(0)  is  positive  definite.  If  the  kurtosis  of  the  input  data 
is  negative,  — 5y(0)  should  be  used  instead.  The  sign  of  the 
input  data  kurtosis  can  be  deduced  from  the  sign  of  the 
eigenvalues  of  Sy(0). 

Fourth-order  identification  principle:  Now  consider 
the  orthonormalized  fourth  order  cumulant  matrices  S_k  de¬ 
fined  as 


t*  0  Sy(k)  =  W[sign(f3)Sy(k)]WT .  (9) 

Pinning  the  definition  (5)  and  (8)  into  (9),  it  comes: 

Vfc^O  Sy(k)  =  {WH)A(k)(WH)T 

=  UA(k)UT.  (10) 

Since  the  matrix  U  is  unitary  and  the  matrix  A(k)  is  di¬ 
agonal,  the  latter  equation  shows  that  any  orthonormalized 
fourth-order  cumulant  matrix  is  diagonal  in  the  basis  of  the 
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columns  of  the  matrix  U  (the  eigenvalues  of  S_y(k)  being 
the  diagonal  entries  of  A (&)). 

If,  for  k  ,£0  the  diagonal  elements  of  A (fc)  are  all  dis¬ 
tinct,  the  unitary  matrix  U  may  be  ‘uniquely’  (i.e.  up 
to  permutation  and  phase  shifts)  retrieved  by  computing 
the  eigendecomposition  of  S_y(k).  Indeterminacy  occurs  in 
the  case  of  degenerate  eigenvalues,  i.e.  when  [A(fc)];i  = 
[A(/c)]jj,  i^j- 

The  situation  is  more  favorable  when  considering  simul¬ 
taneous  diagonalization  of  the  set 

{5  (k)\k  =  -</,•••,  -1, 1,  •••,<?}  of  2 q  orthonormalized  fourth- 
orcfer  cumulant  matrices.  This  set  is  simultaneously  diago- 
nalizable  by  the  unitary  matrix  U  as  in  (10). 

The  matrix  U  is  unique  (to  a  permutation  matrix  and 
phase  factors)  if,  and  only  if,  for  any  pair  (i,j)  ,  there  ex¬ 
ists  an  integer  k  such  that  [A(/c)]«  ^  [A(fc)]j>.  Of  course, 
the  simultaneous  diagonalization  holds  only  for  the  exact 
statistics;  sample  statistics  may  only  be  approximative ly  si¬ 
multaneously  diagonalized  under  the  same  unitary  trans¬ 
formation.  This  calls  for  the  definition  to  an  approximate 
simultaneous  diagonalization. 

Joint  approximate  diagonalization:  The  joint  approx¬ 
imate  diagonalization  can  be  explained  by  first  noting  that 
the  problem  of  the  diagonalization  of  a  single  n  x  n  sym¬ 
metric  matrix  M  is  equivalent  to  the  minimization  of  the 
criterion  [14]: 

c(m,v)  =  -J2\vTMv'\2  (u) 

» 

over  the  set  of  unitary  matrices  V  =  [t»i,  •  •  • ,  u„].  Hence,  the 
joint  approximate  diagonalization  of  a  set  M  =  {Mk\k  = 

1,  •  •  • ,  A'}  of  K  arbitrary  nxn  matrices  is  naturally  defined 
as  the  minimization  of  the  following  criterion: 

C(V,  M)  d=  -  Y,  C{Mk,V)  =  ~Y  \^MkVi\2  (12) 

k  ki 

under  the  same  unitary  constraint.  An  efficient  joint  ap¬ 
proximate  diagonalization  algorithm  can  be  found  in  [6] 
which  is  a  generalization  of  the  Jacobi  technique  for  the 
exact  diagonalization  of  a  single  symmetric  matrix  [14]. 
Fourth-Order  System  Identification  (FOSI):  We  now 
have  at  hand  all  the  necessary  ingredients  to  derive  the  main 
identification  procedure;  it  comprises  the  following  steps 

•  From  the  eigendecomposition  of  the  sample  estimate 
of  the  fourth-order  cumulant  matrix  SH(0),  estimate 
an  orthonormalizing  matrix  W  (by  computing  a  square- 
root  of  the  pseudo-inverse  of  Sy(0)), 

•  Determine  the  unitary  matrix  U  by  minimizing  cri¬ 
terion  (12)  for  the  set  of  the  orthonormalized  sample 
4-th  order  cumulant  matrices 

{Sy(k),  k  =  -q,  ■  ■  ■ ,  -1, 1,  •  •  • ,  g}. 

•  Obtain  an  estimate  of  the  matrix  H  as  H  =  W*U . 

•  Select  the  column  of  H  corresponding  to  the  largest 
absolute  value  of  the  diagonal  entries  of  A (q),  and 
save  the  q  +  1  bottom  elements  of  this  column  into 
vector  fi. 

•  Obtain  an  estimate  of  the  FIR  system  as 


•  From  the  q+ 1  top  elements  of  the  column  of  H  corre¬ 

sponding  to  the  largest  absolute  value  of  the  diagonal 
entries  of  A(— q),  saved  into  vector  fa,  obtain  another 
estimate  of  the  FIR  system  as  ■ 

•  A  third  estimate  of  the  FIR  system  can  be  obtain  by 
averaging  the  two  previous  estimates,  i.e.  h(  average )  ” 

*>(»)+*■(-») 

2 

The  steps  that  provide  the  estimates  and  h  {average) 

are  referred  to  as  F05/(,),  FOS/(_,)  and  FOSI(average ), 
respectively. 

5.  SIMULATION  RESULTS 

In  the  simulated  environment,  an  FIR  LTI  system  is  con¬ 
sidered.  The  input  sequence  is  a  zero-mean  uniform  bi¬ 
nary  process  with  unit  variance.  The  system  output  is  cor¬ 
rupted  by  a  stationary  Gaussian  noise.  The  mean  square 
error  (MSE)  of  the  estimated  FIR  system  coefficients  is  ob¬ 
tained  by  averaging  the  results  of  500  independent  trials. 
All  curves  are  labeled  with  the  steps  used  for  the  identifica¬ 
tion  process.  On  all  the  plots,  the  Cramer-Rao  lower  bound 
(CRLB)  is  provided  to  serve  as  a  reference. 

Example  1:  In  this  example,  we  consider  the  following 
FIR  system:  h  =  [1  —  2  2  4].  In  figure  1,  the  MSE  is 

plotted  in  dB  as  a  function  of  the  noise  level  for  a  sample 
size  T  =  50000.  This  figure  shows  that  the  performances 
are  constant  versus  the  noise  level.  This  suggests  that  the 
proposed  approach  is  robust  to  the  measurement  noise.  In 
figure  2,  the  noise  level  is  kept  constant  at  -20  dB.  The  figure 
shows  the  MSE  in  dB  plotted  against  the  sample  size.  The 
plot  evidences  a  significant  improvement  in  performance  by 
including  a  large  sample  size  in  the  estimation  of  the  sample 
fourth-order  statistics. 


FOSI  performance  for  example  h=[l  -2  2  4] 


Figure  1:  Mean  Square  Error  versus  noise  level  for  exam¬ 
ple  1. 
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FOSI  performance  for  example  h=[l  -224] 


Example  2:  In  this  example,  we  consider  the  following 
FIR  system:  h  =  [1  —  2  0  1],  In  figure  3,  the  MSE  is  plot¬ 
ted  in  dB  as  a  function  of  the  sample  size  for  a  noise  level  of 
-20  dB.  This  figure  shows  that  the  proposed  identification 
method  performs  well  when  the  FIR  system  presents  zero 
taps. 

Example  3:  Here,  we  consider  the  following  FIR  system: 
h  =  [1  —1.87  3.02  —1.435  0.49  —0.8].  This  example  is 
taken  from  [4],  The  results  obtained  in  this  section  cannot 
be  compared  with  those  of  [4]  because  this  reference  consid¬ 
ers  third-order  statistics  for  the  cumulant  estimation,  while 
we  consider  4-th  order  statistics.  Hence  the  comparison 
would  not  be  fair.  In  figure  4,  the  MSE  is  plotted  in  dB  as 
a  function  of  the  sample  size  for  a  noise  level  of -20  dB.  The 
figure  shows  increase  in  performance  allowed  by  a  better  ac¬ 
curacy  of  the  sample  4-th  order  cumulants  when  including 
a  large  number  of  samples  in  the  identification  procedure. 

Through  all  these  examples  and  other  extensive  experi¬ 
ments  not  reported  here,  one  notices  that  FOSI performs 
better  than  FOSI(_q)  for  an  energy  of  h(q)  greater  than 
the  energy  of  fi(0)  =  1.  When  the  energy  of  h(q)  is  lower 
than  the  energy  of  h( 0)  =  1,  the  situation  is  reversed,  i.e. 
F05/(_?)  performs  better  than  FOSI (9).  These  facts  can 
be  explained  by  resorting  to  the  expressions  of  A(q)  and 

A  (-<?)• 


6.  CONCLUSION 

In  this  paper,  the  problem  of  blind  identification  of  FIR 
system  based  only  on  fourth-order  statistics  has  been  in¬ 
vestigated.  An  algebraic  solution  based  on  the  joint  diag- 
onalization  of  a  set  of  orthonormalized  4-th  order  cumu¬ 
lant  matrices  has  been  proposed.  Numerical  simulations 
have  been  performed  to  assess  the  performance  of  the  pro¬ 
posed  method.  These  show  robustness  of  the  proposed  ap¬ 
proach  with  respect  to  the  measurement  noise.  Moreover, 


Figure  3:  MSE  versus  sample  size  for  example  2. 


Figure  4:  MSE  versus  sample  size  for  example  3. 
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the  method  turns  out  to  overcome  the  problem  of  having 

some  zero  taps  in  the  system  impulse  response. 
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ABSTRACT 

In  this  paper,  a  unity-gain  cumulant-based  adaptive  line 
enhancer  (UGCBALE)  is  presented.  This  enhancer  is 
formulated  by  adaptive  filtering  the  output  of  the  CBALE  to 
adjust  the  overall  gain  to  unity.  Owing  to  the  unity-gain 
feature,  this  enhancer  can  be,  for  example,  utilized  as  a 
sinusoidal  interference  canceller  by  subtracting  the  enhanced 
output  from  the  noisy  input.  The  UGCBALE  is  insensitive  to 
Gaussian  noise,  either  white  or  colored  and  its  performance  in 
the  case  of  colored  non-Gaussian  noise  has  been  extensively 
investigated.  Simulation  results  are  presented  to  show  the 
effective  performance  of  the  UGCBALE  in  comparison  with 
the  conventional  adaptive  line  enhancer  (ALE)  when  the  noise 
is  colored  uniformly  distributed  random  process  (UDRP). 

Keywords-  Higher-order  Statistics,  non-Gaussian  noise, 
Adaptive  Line  Enhancer. 

1.  INTRODUCTION 

Conventional  adaptive  line  enhancer  (ALE)  proposed  by 
Widrow  [1]  has  been  successfully  employed  for  the 
enhancement  of  sinusoidal  signal  in  uncorrelated  (white)  noise. 
The  ALE  provides  also  cancellation  of  the  sinusoidal  signal 
interfering  a  broadband  signal.  The  cumulant-based  adaptive 
line  enhancer  (CBALE),  proposed  in  [4],  [2],  is  effectively 
capable  of  enhancing  sinusoidal  signal  in  correlated  (colored) 
Gaussian  noise.  In  spite  of  the  fact  that  the  CBALE 
outperforms  the  conventional  ALE  in  colored  Gaussian  noise 
case,  it  provides  an  unknown  gain,  therefore  its  application  as  a 
sinusoidal  interference  canceller  is  limited  by  this  unknown 
gain.  It  has  been  shown  that  the  gain  of  the  CBALE  is  only 
known  in  the  case  of  a  single  sinusoid  [4],  Therefore,  in  this 
case,  a  sinusoidal  interference  canceller  is  available. 

In  this  paper,  a  unity-gain  cumulant-based  adaptive  line 
enhancer  (UGCBALE)  is  presented.  This  enhancer  is  a 
modified  version  of  the  non-unity-gain  CBALE  described  in 
[4],  [2].  This  modification  makes  the  presented  enhancer 
perform  as  an  adaptive  sinusoidal  interference  canceller  by 
subtracting  its  output  from  its  input.  Owing  to  recent  analysis 
and  results  described  in  [6],  [7]  which  have  proved  that 
employing  higher-order  cumulants  is  an  effective  approach  to 
handling  sinusoidal  signal  corrupted  by  additive  colored  non- 
Gaussian  noise,  the  performance  of  the  presented  enhancer  in 
the  case  of  colored  non-Gaussian  noise  is  investigated.  Section 
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2  gives  background,  including  the  signal  model  and  a  brief 
review  of  both  the  ALE  and  the  non-unity-gain  CBALE. 
Section  3  presents  the  novel  UGCBALE.  Section  4  presents 
illustrative  simulation  results,  which  are  concerned  with 
enhancing  sinusoids  in  colored  uniformly  distributed  random 
process  (UDRP)  noise.  Finally,  Section  5  gives  the 
conclusion. 

2.  BACKGROUND 

In  this  section,  the  signal  model  is  described  and  a  brief 
overview  of  both  the  conventional  ALE  and  the  non-unity-gain 
CBALE  is  presented. 

2.1  The  Signal  Model 

The  observed  signal  x(n)  is  modeled  as  a  sum  of  multiple 
sinusoids  s(n)  plus  zero-mean  additive  colored  noise  v(n) , 
i.e., 

p 

x(n)  =  s(n)  +  v(n)  =  £  Am  cos(2 n  fmn  +  <pm )  +  v(n)  ( 1 ) 

m~  1 

where  the  amplitudes  Am  and  phases  cp„,  are  deterministic 
constants.  The  frequencies  0  <  fm  <  0.5  are  unknown  either 
constants  or  time  varying  parameters,  and  obey  the  constraints 
described  in  [11].  The  additive  noise  v(n)  is  a  zero-mean 
colored  random  process  with  unknown  spectral  density.  It  is 
assumed  that  v(n)  is  the  output  of  a  stable,  linear  shift-invariant 
(LSI)  filter  driven  by  white  either  Gaussian  or  non-Gaussian 
random  process  with  bounded  eighth-order  moment.  A  local 

signal-to-noise  ratio  ( SNRm )  is  defined  as  101og10(A2  Ho2), 

where  a2  is  the  noise  variance.  The  objective  is  to  restore  the 
sinusoidal  signal  s(n)  given  a  single  record  of  the  observed 
noisy  signal  x(n).  If  v(n)  is  of  interest,  then,  another 
objective  is  to  cancel  the  sinusoidal  signal  from  the  observed 
signal  x(n). 

2.2  The  ALE 

Because  we  will  compare  the  results  of  the  presented 
UGCBALE  with  that  of  the  conventional  ALE,  the  ALE  can  be 
briefly  reviewed  as  follows.  The  output  of  the  adaptive  filter 
working  as  a  linear  predictor  is  computed  by 
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(2) 


2  M 

y(n)  =  X  w, ■(«)■*(»-«) 

i=0 

The  error  signal  is  given  by 

e{n)  =  x(n-A)-y{n)  (3) 

where  tv,(rt)  are  the  adaptive  filter  coefficients,  x(n)  is  the 
reference  signal,  x(n  -  A)  is  the  primary  signal  and  A  is  the 
decorrelation  delay,  which  is  enough  in  the  case  of  white  noise 
to  decorrelate  noise  of  both  reference  and  primary  inputs.  The 
appropriate  value  of  A  is  chosen  equal  to  M  in  order  to  keep 
the  causality  of  the  adaptive  filter.  The  normalized  least  mean 
square  (NLMS)  update  equation  is  given  by 

Wj  (n  + 1)  =  Wj  (n)  +  fi  e(ri)x(n  -i)ly  (4) 

2  M 

where  p  is  a  positive  step  size  and  y=  £  x  (n-i). 

i=0 

2.3  The  Non-Unity-Gain  CBALE 

The  non-unity-gain  CBALE  shown  in  Figure  1  is  an  FIR 
adaptive  filter  whose  input  is  the  noisy  signal  x(n)  and  output  is 
the  enhanced  sinusoidal  signal  z(n).  That  is,  the  output  signal 
z(n)  is  given  by 

2  L 

z(n)=X  hf(n)x(n-i)  (5) 

;= o 


Figure  1.  The  cumulant-based  adaptive  line  enhancer 
(CBALE). 

The  adaptive  filter  impulse  response  hj(n)  is  computed 
recursively  using  [4],  [2] 

cx(i  I  n)  =  a\cx(i  I  n  - 1)  +  (1  -a\  )x{n  -  i )  x 

i  i  i  (b) 

[x3(n)-3px(n)x(n)],  |ij  <  L 

hj(n)-cx(i~  L\n)/cx(0\n),  i  =  0,1,---,2L  (7) 

where  L  is  the  maximum  lag,  0 « cq  <  1  is  so-called 
smoothing  factor  and  px(n)  is  the  power  of  x(ri),  which  is 
recursively  estimated  using 

px(n)  =  cc2px(n-l)  +  (l-cc2)x2(n)  (8) 

where  0  «  a2  <  1  is  another  smoothing  factor.  In  the  case  of 
Gaussian  noise,  the  steady  state  of  (7)  is  given  by  [4],  [2] 


P 

hj(°°)  =  ^  Am  cos(2nfm  ( L  - 1 )),  i=0,l, •••,2L  (9) 

m- 1 

where  Am  are  positive  and  unknown  constants.  Therefore,  the 

adaptive  filter  in  the  steady  state  is  a  narrow  bandpass  FIR 
filter  whose  center  frequencies  are  equal  to  the  frequencies  of 
the  input  sinusoidal  signal. 

3.  THE  UNITY-GAIN  CBALE  AND  ITS 
PERFORMANCE  IN  COLORED  NON- 
GAUSSIAN  NOISE  CASE 

3.1  The  Unity-Gain  CBALE 

Figure  2  shows  the  novel  UGCBALE.  It  is  apparent  that  it  is 
composed  of  the  non-unity-gain  CBALE  followed  by  an 
adaptive  noise  canceller  (ANC).  The  basic  idea  arises  from  the 
fact  that  the  output  of  the  non-unity-gain  CBALE  is  given  by 

z(n)  =  ^  (z)i(n)  (10) 

where  f  (z)  is  an  unknown  gain  frequency  dependent  and  the 
noisy  signal  x(n)  is  given  by 

x(n)  =  s(n)+v(n)  (11) 

Therefore,  both  z(n)  and  x(n)  can  be  taken  respectively  as  the 
reference  and  the  primary  inputs  of  an  ANC.  The  ANC  deletes 
the  correlated  signal  of  both  reference  and  primary  from  its 
output  e(n).  To  achieve  this  task,  the  adaptive  filter  associated 
with  the  ANC  will  provide  a  gain  1  /£ (z) .  This  in  turn  implies 
that  the  output  of  this  adaptive  filter  will  be  an  estimate  of  the 
sinusoidal  signal  .r(n) ,  i.e.,  y(n)  =  s(n)  and  the  output  error 
of  the  ANC  will  be  an  estimate  of  the  noise  v(n). 


Figure  2.  The  unity-gain  cumulant-based  adaptive  line 
enhancer  (UGCBALE). 

The  adaptive  filter  associated  with  the  ANC  can  be 
implemented  as  an  FIR  filter  updated  using  the  NLMS 
algorithm.  That  is,  the  output  of  the  UGCBALE  is  computed 
by 

K 

y(«)=S  H’,(n)z(n-i)  (12) 

/=0 

where  z(ti)  is  the  output  of  the  non-unity-gain  CBALE  given 
by  (5).  The  adaptive  coefficients  Wj(n)  can  be  updated  using 
the  NLMS  written  as 

Wj  (n  + 1)  =  Wj  (n)  +  p  e(n)z{n  -i)/y  (13) 
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where  /t  is  a  positive  step  size,y  =  jf  z2(n-i)  and  e(n)  is 

i=o 

the  output  error  of  the  ANC  given  by 

e(n)  =  x(n)  -  y(n)  (14) 

3.2  The  Performance  in  Colored  Non-Gaussian  Noise  Case 

It  is  obvious  that  the  suggested  UGCBALE  is  composed  of  two 
cascade  connected  FIR  adaptive  filters;  the  first  FIR  filter  is 
updated  based  on  higher-order  statistics  while  the  second  FIR 
filter  is  updated  based  on  second-order  statistics.  Therefore,  the 
type  of  noise  (Gaussian  or  non-Gaussian)  affects  the  first  filter 
alone.  This  implies  that  to  investigate  the  performance  of  the 
UGCBALE  in  non-Gaussian  noise  case,  one  needs  to  examine 
the  performance  of  the  first  adaptive  filter.  To  achieve  that, 
assume  v(n)  to  be  colored  non-Gaussian  noise,  then  it  can  be 
shown  that  the  steady  state  of  (7)  is  given  by 

p  _ 

*(«')=  X  Amcos(2#m(L-/))  + 

m=l  (15) 

p  civ{L-i),  i  =  0,1,  -,2L 

where  P  is  the  reciprocal  of  the  Kurtosis  of  x(n)  and  c4l,(.) 
is  a  one-dimensional  slice  of  the  fourth-order  cumulant  of  the 
noise  v(n).  That  is,  the  steady  state  impulse  response  in  this 
case  is  sinusoidal  signal  corrupted  by  the  fourth-order 
cumulant  of  the  noise  normalized  to  the  Kurtosis  of  x(n). 

In  [7],  it  has  been  proved  that  a  signal  composed  of  sinusoids 
plus  noise  posses  two  SNR’s.  The  first  is  the  conventional 
(original)  SNR  associated  with  second-order  statistics  of  the 
signal  and  the  second  is  a  new  SNR  associated  with  the 
employed  one-dimensional  slice  of  the  fourth-order  cumulant 
of  the  signal.  From  (15)  and  after  some  manipulations,  the  new 
local  SNR  defined  as  the  ratio  of  the  mth  sinusoidal  amplitude 
A,„  to  the  value  /3cv( 0)  and  termed  the  signal-to-noise 
Kurtosis  ratio  ( SNKRm ) ,  is  given  by  [6],  [7] 

SNKRm  =  (12/8)  I  a4  ly  U  SNR2,  (16) 

where  the  ratio  I  cr4 /y  I,  characterizing  the  white  noise 
generating  the  additive  colored  noise,  is  a  constant  for  each 
particular  non-Gaussian  noise,  it  is,  for  example,  5/6  for 
uniformly  distributed  random  process  (UDRP)  noise  [7];  X  is 
a  new  measure  for  the  noise  spectrum  distribution,  it  is  equal  to 
one  when  the  noise  is  white  and  increases  with  the  decrease  of 
the  noise  spectrum  bandwidth  [7]  and  SNRm  is  the 
conventional  local  SNR.  This  implies  that  in  the  case  of  UDRP 
noise  and  SNRm  =  2  ,  SNKRm  is  equal  to  5A.  Then,  if  the 
noise  is  white,  SNKRm  is  2.5  times  SNRm  .  If  the  noise  is 
colored  and  X  =  10,  for  example,  SNKRm  is  25  times  SNRm. 
This  proves  that  updating  the  first  filter  using  higher-order 
cumulants  eliminates  completely  the  effect  of  Gaussian  noise 
and  reduces  the  effect  of  colored  non-Gaussian  noise. 

In  white  non-Gaussian  noise  case,  the  UGCBALE  outperforms 
the  conventional  ALE  provided  that  the  SNKRm  is  equal  to  or 
greater  than  SNRm,  In  this  case,  it  is  obvious  from  (16)  that 
the  conventional  SNRm  is  given  by 


This  implies  that  there  is  a  minimum  SNRm  characterizing 
each  particular  white  non-Gaussian  noise,  which  ensures  that 
the  UGCBALE  outperforms  the  conventional  ALE.  This 
minimum  SNRm  is  equal  to  0.8  for  uniformly  distributed  noise. 


4.  SIMULATION  RESULTS 


To  examine  the  performance  of  the  UGCBALE  in  comparison 
with  the  conventional  ALE,  the  following  simulation  examples 
are  conducted.  In  these  examples,  the  results  are  averaged  over 
20  trails  each  consists  of  2048  iterations. 

Example  1-  In  this  example  the  input  signal  x(n)  is  given  by 
x(n)  =  cos(0.2;r  n)  +  v(n)  (18) 

where  v(n)  is  a  zero-mean  colored  uniform  distributed  random 
process  (UDRP)  noise  generated  by  passing  a  zero-mean  white 
UDRP  noise  through  the  following  coloring  filter; 


G(z).=  0.138 


_ l  +  2z  1  +z  2 _ 

(l-0.98^0-5VlXl-0.98e'''asV1) 


(19) 


Fig.  3.  Spectrum  of  the  input  signal  x(n)  used  for  Example  1 . 


Figure  4.  Spectrum  of  the  output  of  the  UGCBALE. 


SNRm  >  (8/12)  ly /c4  I  (17) 


Figure  5.  Spectrum  of  the  output  of  the  ALE. 
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That  is,  the  spectrum  of  the  colored  noise  has  two  strong  peaks 
at  frequencies  /  =  +0.25.  The  autocorrelation  function  of  this 

noise  is  an  exponentially  damped  sinusoid  of  damping  factor 
0.98  and  an  oscillation  frequency  of  0.25,  which  implies  that 
the  noise  is  highly  correlated.  The  variance  of  v(n)  is  adjusted 
to  achieve  2.0  SNR.  Using  [6],  [7]  the  factor  A  given  in  (16) 
of  the  noise  spectrum  provided  by  the  coloring  filter  (19)  can 
be  computed.  It  is  equal  to  25.5,  which  means  that  the  SNKR 
is  about  64  times  the  original  SNR.  This  SNKR  gives  primary 
indication  that  the  UGCBALE  outperforms  the  ALE  by  about 
18.0  dB.  The  parameters  characterizing  the  UGCBALE 
X,a,L,(i,  K  are  taken  0.99,  0.999,  24,  0.001  and  23 
respectively.  The  parameters  characterizing  the  conventional 
ALE  M,  A, p  are  taken  36,  36,  0.001  respectively.  The  step 
sizes  are  selected  experimentally  to  obtain  maximum  noise 
attenuation  and  stable  adaptation  process.  Both  enhancers  are 
started  with  zero  initial  weights.  Figures  3,  4  and  5  show  the 
spectra  of  the  input,  the  UGCBALE  output  and  the  ALE  output 
respectively.  From  these  spectra,  the  UGCBALE  attenuates  the 
noise  spectral  peak  at  /  =  0.25  by  about  18.0  dB  while  the 
ALE  attenuates  this  peak  by  about  8.0  dB.  This  implies  that 
the  UGCBALE  outperforms  the  ALE  by  about  20.0  dB,  10.0 
dB  for  each  noise  spectral  peak. 

Example  2-  In  this  example  the  input  signal  x(n)  is  given  by 
x(n)  =  cos(0.2?r  n)  +  cos(0.6tr  n)  +  v(«)  (19) 

where  v(n)  is  the  noise  described  in  Example  1.  The  local 
SNR’s  are  SNR{  -  SNR2  =  2.0.  The  parameters  characterizing 
both  enhancers  are  taken  as  in  Example  1.  Figures  6,  7  and  8 
show  the  spectra  of  the  input,  the  outputs  of  both  the 
UGCBALE  and  the  ALE,  respectively.  It  is  apparent  that  the 
UGCBALE  still  outperforms  the  ALE  by  about  16.0  dB.  The 
impulse  responses  of  both  the  ANC  filter  associated  with  the 
UGCBALE  and  the  ALE  are  investigated.  The  predictor  filter 
of  the  ALE  has  two  jobs,  the  first  is  to  remove  the  noise  and 
the  second  is  to  keep  the  gain  equal  to  one  at  the  frequencies  of 
the  sinusoidal  signal.  The  adaptive  filter  of  the  ANC 
associated  with  the  UGCBALE  has  only  one  job.  It  tries  to 
adjust  the  gain  equal  to  one  at  the  frequencies  of  the  sinusoidal 
signal.  This  is  because  the  noise  was  removed  by  the  CBALE. 
This  single  job  facilitates  the  adaptation  process  of  this 
adaptive  filter.  For  space  limitation,  the  impulse  responses  of 
all  adaptive  filters  associated  with  both  enhancers  and  error 
signals  are  omitted.  Spectral  estimation  of  each  signal  is 
obtained  by  using  256-FFT  to  256  points  of  the  signal  before 
the  end  of  simulation  (before  the  final  iteration). 


Figure  6.  Spectrum  of  the  input  signal  for  Example  2. 


Figure  7.  Spectrum  of  the  output  of  the  UGCBALE. 


Figure  8.  Spectrum  of  the  output  of  the  ALE. 

5.  CONCLUSION 

In  this  paper,  a  unity-gain  cumul ant-based  adaptive  line 
enhancer  (UGCBALE)  has  been  presented.  It  is  composed  of 
two  cascade  connected  FIR  adaptive  filters.  The  first  one  is 
updated  using  higher-order  cumulants  of  the  input  signal.  It  is 
then  insensitive  to  Gaussian  noise  (white  or  colored)  and  it  has 
been  shown  theoretically  and  experimentally  that  it  performs 
well  in  colored  non-Gaussian  noise  case.  The  second  one, 
updated  using  second-order  statistics  based  NLMS  algorithm, 
is  to  adjust  the  overall  gain  to  be  approximately  one.  Then, 
adaptive  sinusoidal  interference  canceller  is  available. 
Simulation  results  have  shown  that  the  UGCBALE 
outperforms  the  conventional  ALE  having  the  same  number  of 
coefficients  in  the  case  of  colored  uniformly  distributed 
random  process  (UDRP)  noise. 
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ABSTRACT 

A  cumulant-based  adaptive  approach  for  the  detection  and 
extraction  of  sparse  signal  embedded  in  colored  Gaussian 
noise  is  presented.  In  this  approach,  the  extracted  signal  is 
obtained  by  adaptive  FIR  filtering  of  the  noisy  signal. 
Coefficients  of  the  adaptive  filter  are  updated  using  a 
recursive  algorithm  based  on  a  sum  of  cumulants  of  orders 
k  >3  of  the  input  signal.  This  is  to  ensure  super  sufficient 
detection  of  different  sparse  signals  and  to  ensure  efficient 
removal  of  colored  Gaussian  noise.  It  is  shown  that  when  the 
sparse  pulse  is  absent,  the  coefficients  of  the  adaptive  filter 
converge  to  zero.  However,  when  the  sparse  pulse  exists  the 
FIR  adaptive  filter  converges  to  a  type  of  signal-matched 
filters.  Simulation  and  experimental  results  are  included  to 
show  the  high  efficiency  of  the  presented  approach  in 
comparison  with  the  adaptive  short-term  correlation 
counterpart. 

Keywords •  Higher-order  Statistics,  Sparse  Signals, 
Biomedical  signals,  Gaussian  noise,  Detection. 

1.  INTRODUCTION 

Extraction  of  short-time  extent  (sparse)  signal  embedded  in 
additive  noise  is  a  problem  frequently  encountered  in  a  variety 
of  fields  such  as  biomedical  signal  processing,  radar, 
communications,  etc.  Various  adaptive  filters  based  on  the 
correlation  of  the  input  noisy  signal  may  be  satisfactory  when 
the  additive  noise  is  white.  However,  in  highly  correlated 
noise  case,  the  impulse  response  of  the  adaptive  filter 
converges  to  the  autocorrelation  function  of  both  the  signal 
plus  the  additive  colored  noise.  This  implies  that  the  passband 
of  the  filter  spectrum  will  be  in  the  same  band  of  both  the 
signal  and  noise.  Therefore,  the  adaptive  filter  output  will  be  a 
version  of  input  noisy  signal  [1],  [2]. 

Recently,  higher-order  statistics  or  cumulants  have  been 
successfully  employed  for  the  detection  and  classification  of 
non-Gaussian  signals  in  Gaussian  noise.  This  is  because 
higher-order  cumulants  of  Gaussian  noise  either  white  or 
colored  are  identically  zero  [3]-[12],  Various  fourth-order 
cumulant  slices  of  the  noisy  signal  have  been  used  for  the 
retrieval  of  harmonic  signal  in  colored  Gaussian  noise.  It  has 
also  been  explained  that  employing  fourth-order  cumulants 
slices  is  an  efficient  approach  to  handling  colored  non- 
Gaussian  noise  corrupting  a  sinusoidal  signal  [4],  [8],  Various 
fourth-order  cumulant-based  filtering  techniques  (fixed  or 


adaptive)  have  been  described  for  the  enhancement  of  a 
sinusoidal  signal  in  colored  either  Gaussian  or  non-Gaussian 
noise.  These  techniques  have  been  developed  with  the 
assumption  that  the  signal  is  stationary  [3]-[5].  In  [3], 
coefficients  of  an  FIR  adaptive  line  enhancer  have  been 
recursively  updated  using  one-dimensional  slice  of  the  fourth- 
order  cumulant  of  the  input  signal.  Computation  of  this  slice 
needs  the  power  of  the  input  signal  to  be  recursively  estimated. 
Therefore  any  small  error  of  the  estimated  power  of  the  input 
signal  will  influence  the  performance  of  the  algorithm, 
especially  when  the  signal  is  nonstationary.  In  [5],  a  cumulant- 
based  IIR  adaptive  notch  filter  has  been  described  for  the 
enhancement  and  tracking  of  a  single  sinusoid  in  noise. 

In  this  paper,  a  new  approach  for  the  detection  and  extraction 
of  sparse  signals  embedded  in  colored  Gaussian  noise  is 
presented.  In  this  approach,  the  extracted  signal  is  obtained  by 
passing  the  noisy  signal  through  an  FIR  adaptive  filter  whose 
coefficients  are  updated  using  a  proposed  algorithm  based  on  a 
sum  of  cumulants  of  orders  k  S  3  of  the  input  signal.  This  is  to 
ensure  reliable  and  efficient  detection  of  sparse  signal  and  to 
ensure  the  removal  of  Gaussian  noise.  Another  important 
motivation  of  employing  weighted  sum  of  cumulants  is  to 
avoid  a  problem  may  arise  when  specific  higher-order 
cumulant  is  zero. 

2.  SIGNAL  MODEL 

In  this  paper  we  concerned  with  a  class  of  signals  that  can  be 
modeled  as  a  sum  of  short-extent  pulses  (sparse  signals),  i.e., 

s(n)  -  g  o  A^(n-Aff)exp(-a,  n2)cos(ffl,n  +  g)I)  (1) 

where  Ai',°i>£Ui  and  <Pi  are  unknown  parameters:  amplitude, 
the  damping  factor,  the  frequency  and  the  phase  of  the  i th 
cosine  pulse,  respectively,  with  0<cOj<7t.  The  time  A/; 
represents  the  time  position  of  the  center  of  the  ith  pulse  on  the 
time  axis  n  and  <S(.)  is  the  delta  function  . 

Due  to  the  presence  of  noise,  one  observes  a  contaminated 
version  of  s(n),  namely 

x(n)  =  s(n)  +  v(n)  (2) 

where  v(n)  is  assumed  to  be  a  zero-mean  additive  Gaussian 
noise  of  unknown  covariance.  Additionally,  v(n )  is  considered 
to  be  the  output  of  a  stable,  linear  time-invariant  (LTI)  filter 
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driven  by  i.i.d.  Gaussian  noise  with  bounded  higher-order 
moments.  A  local  signal-to-noise  ratio  (SNR)  is  defined  as  the 
ratio  of  the  maximum  amplitude  of  the  pulse  signal  to  noise 
power,  i.e., 

SNR/  =101ogi0(IA,-  \lo2)  (3) 

The  objective  is  to  detect  and  to  extract  s(n)  given  only  the 
noisy  signal  x(n). 

3.  CUMULANT-BASED  FIR  ADAPTIVE 
FILTERING 


3.2  FIR  Adaptive  Filtering 

Because  fixed  filtering  is  not  satisfactory  to  deal  with 
nonstationary  signals,  adaptive  filter  is  essentially  required. 
Therefore,  for  tracking  ability,  it  is  desired  to  formulate  an  FIR 
adaptive  filter  based  on  the  idea  described  in  subsection  3.1. 
This  adaptive  filter  is  required  to  detect  the  existence  and/or 
the  absence  of  each  pulse.  In  the  existence  case,  it  is  required 
that  the  adaptive  filter  impulse  response  is  to  converge  to  the 
pulse-matched  function.  And  in  the  absence  case,  it  is  required 
that  the  adaptive  filter  is  to  forget  the  values  of  the  impulse 
response  previously  computed.  We  propose  that  the  impulse 
response  is  to  be  recursively  computed  as  follows 


3.1  FIR  Fixed  Filtering 

Our  idea  is  to  pass  the  noisy  signal  through  an  FIR  filter  whose 
impulse  response  is  a  sum  of  higher-order  cumulants  of  x(n). 
The  noise-free  signal  is  deterministic  signal  consisting  of  a 
sum  of  pluses  (sparse  signal).  This  sparse  signal  is  embedded 
in  colored  Gaussian  noise.  In  such  case,  mixed  cumulants  [10] 
computed  over  time  average  have  been  adopted.  The  second- 
and  third  order-cumulants  are  given  by  [9] 

c2x (?)  =<  u(n)u(n  +  t)>=-j-  u(n)u(n  +  r)  (4) 


h( T  I  n)  =  p  h(r  I  n)  +  (1  -  p )F[x2  (n)] 
x  x(n-P+r),  t=0,1,---,2 P 

where  0«/5<l  is  so-called  forgetting  factor,  F[x^(nj]  is  a 

nonlinear  function  of  x2(n)  and  x(n)  is  the  input  signal  with 
mean  removed.  Due  to  this  nonlinear  function,  the  impulse 
response  of  the  adaptive  filter  converges  to  a  sum  of  cumulants 
of  orders  k  >  3  . 

The  output  of  the  adaptive  filter  is  given  by 


c3.r (Ti <Ti)=<  u(n)u(n  +  r | )u(n  +  t2 )  > 

=  «(«)»(«  +  T1)«(/t  +  r2) 

where  u(n)  =  x(n)-<x(n)>. 


T"i2P 

y(n )  =  ^r=0  S>gn(h(Pln))h(r\n)x(n-P-T)  (11) 

where  Sign(h(P\n))  is  the  sign  of  h(P\n)  given  in  (10). 
This  sign  is  included  to  avoid  the  negative  sign  may  appear 
with  higher  order  cumulants  (i.e.,  Skewness,  Kurtosis,  etc.). 


For  convenience,  let  in  (1)  /,  M t ,  and  <p,  be  equal  to  zero, 
i.e.,  the  noise-free  signal  is  only  one  pulse  at  the  origin  time 
with  zero  phase.  In  this  simple  case  we  use  the  third-order 
mixed  cumulant  of  x(n).  In  this  case,  the  impulse  response  of 
the  FTR  adaptive  filter  is  suggested  to  be: 


h(  T)  =  c3x  (0,  T)  =  c3s  (0,  T)  +  c3„  (0,r)  (6) 


where  c3v(.)  is  the  ensemble  third-order  cumulant  of  the  noise 
v(n).  Due  to  the  fact  that  the  noise  is  Gaussian,  hit)  in  (6) 
reduces  to 

h(r)  =  c3s{0,r)  (7) 

Using  (1),  (5)  and  (7),  h(x)  can  be  written  as 


*(T)  =— YN  1  A4  e-2«n2e-«(n+T)2 
N  *-‘n=0 


(8) 


2 

x(cos(ft>o«))  cos(fijg(n  +  r)) 

After  simple  manipulations  the  impulse  response  can  be  given 
by 


/t(T)  =  yA4e  ax 2 


cos(®oT) 


(9) 


Figure  1  shows  a  block  diagram  for  the  cumulant-based 
adaptive  filter  while  Figure  2  shows  an  illustrative 
implementation  of  the  cumulant-based  recursive  algorithm 
given  in  (10).  It  is  obvious  that  the  nonlinear  function  make  us 
be  able  to  use  the  adaptive  short-term  correlation  estimator  for 
the  computation  of  a  sum  of  cumulants  of  orders  k  >  3  of  the 
input  signal  [2], 

It  is  worth  to  note  that  the  rate  of  the  recursive  algorithm  is 
dependent  upon  the  choice  of  the  factor  p  .  Small  values 
cause  fast  forgetting  but  on  the  other  hand  it  may  cause 
insufficient  smoothing,  i.e.,  not  enough  convergence  to  the 
pulse  signal  shape.  Therefore,  selecting  p  is  dependent  upon 
trail  work  and  upon  the  spread  of  the  signal  pulses. 

For  convenience,  the  counterpart  of  the  presented  approach, 
which  is  based  on  second-order  statistics  and  termed  the  ASC 
algorithm,  can  be  summarized  as  follows  [2] 

/i(t  I  n)  =  p  h{r  I  n)  +  (1  -  P)x(n)  x(n-  P  +  r),  r  =  0,1,-  ••  ,2  P 

(12) 

This  implies  that  the  impulse  response  of  the  adaptive  filter 
converges  to 


h(T)  =  r2s(T)  +  rlv(T)  (13) 


where  y  is  a  constant  fixed  or  slowly  changing  with  time 
shift  r.  Then  when  the  pulse  exits,  the  impulse  response  of  the 
FIR  filter  is  a  type  of  pulse-matched  filters,  which  means  that 
the  filter  bandpass  is  identical  with  the  band  of  the  pulse 
signal.  If  the  pulse  is  absent,  the  impulse  response  is 
identically  zero,  which  implies  that  in  this  case  the  output  is 
zero. 


Because  we  assume  that  the  noise  is  colored  (especially  highly 
colored),  the  impulse  response  is  equal  to  the  autocorrelation 
function  of  the  noise  when  the  pulse  signal  is  absent  and  is 
equal  to  the  autocorrelation  of  the  pulse  signal  plus  the 
autocorrelation  of  the  noise  when  the  pulse  signal  exists.  This 
implies  that  the  ASC  algorithm  will  not  be  able  to  reject  the 
colored  noise  even  whenever  the  signal  pulse  is  absent  or 
existing. 
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4.  SIMULATION  RESULTS 

To  examine  the  presented  adaptive  filtering  techniques  for  the 
enhancement  and  detection  of  sparse  signal  in  colored 
Gaussian  noise,  the  following  examples  have  been  conducted. 
Results  of  the  presented  algorithm  are  compared  with  the 
adaptive  short-term  correlation  algorithm.  In  all  examples  the 
additive  colored  Gaussian  noise  v(n)  is  generated  by  passing 
zero-mean,  i.i.d.  Gaussian  noise  through  the  following  coloring 
filter: 

0.1°°7  - 1*2*  - Tfiroli  (14) 

(1  -  0.98  e12”02  )(1  -  0.98<?-'/20'2 ) 

The  covariance  of  this  coloring  filter  is  a  damped  sinusoid  with 
damped  factor  0.98  and  an  oscillation  normalized  frequency  of 
0.2.  This  implies  that  the  autocorrelation  (second-order 
statistics  of  v(n)  is  of  considerable  values  over  long  time  shift 
r.  The  noisy  signal  x(n)  of  length  2000  is  obtained  by  adding 
the  colored  noise  v(n)  to  the  sparse  signal  that  is  specified  with 
every  example.  The  order  of  FIR  adaptive  filters  for  both 
algorithms  is  taken  P  =  8  and  the  forgetting  factors  for  both 
algorithms  is  /J  =  0.95.  The  nonlinear  function  is  taken 

F(jc2  )  =  1  /(I  +  e-0'5jr 2 ). 

Example  I:  In  this  example  the  noise-free  sparse  signal 
(amplitude  versus  time)  shown  in  Figure  3  (a)  is  investigated. 
The  power  of  the  noise  v(n)  is  adjusted  to  achieve  0.0  dB  SNR. 
The  signal  embedded  in  colored  noise  is  shown  in  Figure  3  (b). 
Figures  3  (c)  and  (d)  show  the  results  of  estimated  signals 
using  both  algorithms  obtained  from  20  Monte  Carlo  runs.  It 
is  obvious  that  the  presented  algorithm  performs  better  than  the 
one  based  on  correlations. 

Example  2:  In  this  example  the  noise-free  sparse  rectangular 
signal  shown  in  Figure  4  (a)  is  investigated.  The  power  of  the 
noise  v(n)  is  adjusted  to  achieve  O.OdB  SNR.  The  sparse  signal 
embedded  in  noise  is  shown  in  Figure  4  (b).  Figures  4  (c)  and 
(d)  show  the  results  of  enhancement  using  both  algorithms 
obtained  from  20  Monte  Carlo  runs.  It  is  obvious  that  the 
presented  algorithm  still  performs  better  than  the  one  based  on 
correlations. 

Example  3:  In  this  example  we  have  used  ECG 
(electrocardiogram  artifact)  recorded  by  MEG  machine.  Only 
one  channel  is  shown  in  Fig.  5  (a).  Figures  5  (b)  and  (c)  show 
the  results  of  both  algorithms  obtained  from  20  channels.  It  is 
apparent  that  the  presented  algorithm  outperforms  the  one 
based  on  correlations. 

6.  CONCLUSION 

A  cumulant-based  adaptive  approach  for  the  detection  and 
extraction  of  sparse  signal  embedded  in  colored  Gaussian 
noise  has  been  presented.  In  this  approach,  the  noisy  signal  is 
passed  through  an  FIR  adaptive  matched  filter  whose 
coefficients  are  updated  using  a  recursive  algorithm  based  on 
a  sum  of  cumulants  of  orders  k  >  3  of  the  input  signal.  This  is 
to  ensure  super  sufficient  classification  of  various  signals  and 
to  ensure  the  removal  of  Gaussian  noise.  It  has  been  shown 
that  in  the  absent  of  the  sparse  signal,  the  coefficients  of  the 
adaptive  filter  converge  to  zero.  However,  the  adaptive  filter 
converges  to  a  type  of  sparse-matched  filters  over  the  sparse 
time  window.  Simulation  and  experimental  results  have 
shown  the  efficiency  of  the  presented  approach  in  comparison 
with  the  adaptive  short-term  correlation  counterpart. 
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Figure  1.  Block  diagram  of  adaptive  filtering  based  on  the 
suggested  cumulants  based  adaptive  algorithm. 
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Figure  2.  A  scheme  using  nonlinear  function  and  a  bank  of 
smoothing  filters  for  the  recursive  computation  of  the  adaptive 
filter  coefficients  using  a  sum  of  cumulants  of  orders  k  >  3  of 
the  input  signal. 

(d) 


Figure  4.  Results  of  Example  2:  (a),  noise-free  rectangular 
signal;  (b),  observed  noisy  signal;  (c),  the  enhanced  output 
using  the  proposed  technique;  and  (d),  the  output  of  the 
conventional  ASC  algorithm. 


(c) 


(d) 

Figure  3.  Results  of  Example  1:  (a),  noise-free  sparse  signal; 
(b),  observed  signal  with  additive  noise;  (c),  reconstructed 
signal  using  the  proposed  technique;  and  (d),  reconstructed 
signal  using  the  conventional  ASC  algorithm. 


(b) 


(c) 


Figure  5.  Results  of  Example  3  (ECG):  (a),  recorded  ECG 
signal  by  using  MEG  machine;  (b),  reconstructed  signal  using 
the  proposed  technique;  (c),  reconstructed  signal  using  the 
conventional  ASC  algorithm. 


(a) 
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Abstract 

This  paper  introduces  a  cross-relation  (CR)  based  higher  order 
matched  field  (MF)  processing  technique  for  estimating  the 
location  of  a  random  source  in  shallow  water.  It  is  known  that 
the  probability  density  function  (PDF)  signals  emitted  from 
marine  vessels  have  higher  order  components.  Using  the 
higher  order  MF  processor  can  cancel  the  effect  of  either 
white  or  non-white  Gaussian  random  interferences  since  the 
third  and  higher  odd  moments  and  third  and  higher  order 
cumulants  of  Gaussian  random  interferences  are  zero.  We 
have  examined  the  higher  order  content  of  experimental  ship 
data  as  it  effects  the  MF  processor  for  estimation  of  location 
using  different  frequency  bands.  The  ship  data  is  recorded 
during  a  sea  trial  conducted  on  September  1993  in  a  region 
close  to  Vancouver  Island,  BC,  Canada. 

1.  Introduction 

Parametric  estimators  based  on  matching  between  the 
measured  signals  and  replicas  based  on  the  environmental 
model  are  widely  used  in  various  signal  processing 
applications  including  source  location  and  environmental  oir 
channel  parameter  estimation  [1,2].  The  development  of  these 
techniques,  well  known  as  matched  field  processing  (MFP)  is 
indebted  to  reliable  numerical  models  that  model  the  field  with 
high  precision.  In  underwater  acoustics,  several  software 
packages  have  been  released,  such  as  SAFARI,  OASES  and 
ORCA  [3,4]  for  modeling  the  acoustic  field. 

The  higher-order  statistics  have  shown  wide  applicability  in 
many  diverse  fields  such  as  sonar,  radar,  seismic  signal 
processing,  data  analysis  and  system  identification  [5-7], 
Specifically,  cumulants  and  their  associated  Fourier 
transforms,  known  as  polyspectra,  reveal  not  only  amplitude 
information  but  also  phase  information.  This  is  important 
because,  as  is  well  known,  second-order  statistics  (such  as  the 
auto-correlation)  are  phase  blind.  Cumulants,  on  the  other 
hand,  are  blind  to  any  kind  of  Gaussian  process;  thus  they  can 
handle  colored  Gaussian  measurement  noise  automatically, 
whereas  correlation-based  methods  do  not.  Cumulant-based 
methods  boost  signal-to-noise  ratio  when  signals  are  corrupted 
by  Gaussian  interference.  Ship  data  has  a  complex  distribution 
with  statistics  higher  than  second  order,  so  a  matched  field 
processor  based  on  higher  order  statistics  will  let  us  use  more 
of  the  information  in  the  data.  The  greatest  drawbacks  to  the 
use  of  higher-order  statistics  are  that  they  require  longer  data 
records  and  much  more  computation  than  do  correlation-based 
methods.  Longer  data  lengths  are  needed  in  order  to  reduce  the 
variance  associated  with  estimating  the  higher-order  statistics 
from  real  data  using  sample  averaging  techniques. 


The  kth-order  cumulant  is  defined  [8]  in  terms  of  its  joint 
moments  of  orders  up  to  k  and  vise  versa.  The  moment-to- 
cumulant  formula  is 

C„(/)  =  X  (-ir'(4-l)!]iI».,(/,)  (I) 

where  Up=i  /p  =  /  denotes  summation  over  all  partitions  of 
set  /.  Set  /  contains  the  indices  of  the  components  of  vector  x 
where  JC  =  [jc, ,  JC2 , ...» ]r  denotes  a  collection  of  random 
variables.  The  partition  of  the  set  I  is  the  unordered  collection 
of  nonintersecting  nonempty  sets  /  such  that 

Up=i  Ip  =  /  where  q  is  the  number  of  partitions  sets  I p  . 
mx  (Ip )  indicates  the  moment  of  the  partition  x 

corresponding  to  set  I  ,  i.e.,  mx(Ip)  =  E  [x\X2..-Xp  ) . 
The  cumulant-to-moment  formula  is: 

*,(/)-  I  C,(J,)  (2) 

U 

2.  Higher-order  MFP 

Let  us  consider  the  geometry  of  the  measurement  system  for 
ship  localization  in  shallow  water  using  a  vertical  linear  array 
with  N  sensors  as  shown  in  Fig.l.  This  system  can  be  modeled 
by  a  multi-channel  system  shown  in  Fig.  2,  consisting  of  N 
linear  transfer  functions.  The  transfer  function  'hr  corresponds 
to  the  paths  traveled  by  acoustic  waves  from  the  ship  to  the  i,h 
sensor,  including  interactions  with  ocean  bottom  and  surface. 
It  is  assumed  that  the  noise  is  additive  and  is  spatially  and 
temporally  white,  Gaussian  and  uncorrelated  with  the  input 
signal.  The  cross-relation  in  equation  (3)  follows  from  the 
linearity  of  the  transfer  functions: 

yp(n)=hpM*S(n) 
yq(n)=hq(n;ct)*S(n ) 

p,q= 1,2,. . .  ,N’,pt-q 

(3) 

Now,  we  derive  a  matched  field  processor  based  on  higher- 
order  statistics  by  multiplying  both  sides  of  the  DFT  of 
equation  (3)  (in  frequency  variable  F)  by  a  subset  of 

Yk  (F) ,  k  —  1, ...,  N ,  k  t-  P,  q ,  Taking  expectation  of 


=  3hp(n;a)  *yq(n)=hq(n-,d)  *yp(ri) 
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this  product  produces  a  T-order  cross-relation  equation 

d<JLk _ _ _ 

Hp(F,ci^Y,(F)Y.(F)...YmTJF))  = 

H,(F;cQE(Yr(F)Y„(F)..ymT_1(F)) 


p,q,m  =  l,2,...,N;p*q  , 

Iq  =  {q,m,m  +  l,...,m  +  T -2}, 

Ip  ={p,m,m  +  \,...,m  +  T -2},  and  MFP  order  is 
T. 


Fig.  1.  The  benchmark  of  a  system  for  ship  localization  using 
a  vertical  linear  array  in  shallow  water 


If  we  replace  moments  by  cumulants  in  equation  (4) 


we  pptam 

f  \ 

f  \ 

Hp(F-a) 

£ 

II 

£ 

U=.H 

L  *-f=l 

—L 

(5) 


The  above  equations  can  be  written  in  the  following  matrix 
form  to  solve  for  all  channel  responses  simultaneously: 

CUMyH  =  0  (6) 

where 


h  =  [hI,ht2,...,hI]t  , 

H,=[h,(0),H,(.F) . H,((L-l)F)]T  ,  1-1,2, 


CUMy  = 


cumU . . . 


'N' 

\T> 


blocks 


cum 


1  xNL 

p,q,kt  r..,kT-2 


0  cumY  Y  Y  0  -cumY  v  v  0 

- .  i . r*r- :  rp’yt) . rkr-2 


v - - '  Hq-p-i)L 


lxl 


cum 


Y>\ . V2 


I c 


UU.W. 


r/,(«).r*I(o)...r^..j(o) 


(/r)  .... 


....  X  C, 


Y'ia-DF)^  ((L-l)F) . V2  ((L-\)F) 


a) 


The  identifiability  condition  is  that  the  null  space  dimension  of 
matrix  CUM Y  should  be  one  and  Hf  ,1  =  1, 2, ...,  N  , 

should  not  be  zero.  To  give  more  explicit  expressions  and 
provide  more  insights  into  the  characteristics  of  the  channels 
and  the  source  signal,  the  following  conditions  are  given 
(based  on  the  theorem): 

1.  For  all  frequencies  f(,i  =  1,2 . M  ,  the  transfer 

fimctions  Ht  ,1  =  1, 2,  should  not  be  zero. 

2.  In  order  to  have  the  null  space  dimension 
of  CUMy  equal  to  one,  and,  assuming  the  condition 
mentioned  above  is  satisfied,  the  source  T-order  moment 
should  be  non-zero  for  all  frequencies. 

For  the  case  where  channels  are  corrupted  by  noise,  the  least- 
square  estimator,  referred  to  as  the  high-order  cross-relation 
based  MFP,  is 

PyH-a  =  1  CUMyHf'  (7| 

The  above  can  be  rewritten  in  the  following  form  to  give  a 
more  explicit  expression  of  the  processors: 
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where  p,  q,  m,  r  are  not  equal  numbers  chosen  from  set 

{12 . N}. 


3.  Sensitivity  analysis 


Let  us  assume  that  a  deviation  in  the  true  source  location  or 
environmental  parameter  has  occurred.  The  cross-relation  term 
(equation  (8))  takes  the  form 

CR„  =E(ST)HJF)...HmT_1(F)p„ 
p„  =  Hr(F-a)Hq(F)-H,<F-,a)Hp(F) 

For  parameters  with  low  sensitivity  to  the  pressure  field,  there 
is  no  considerable  change  in  the  amplitude  of  the  transfer 
function.  In  this  case  we  mainly  focus  on  the  transfer 
functions’  phase.  Moreover,  let  us  assume  that  the  array  length 
is  small  enough  in  comparison  to  the  water  depth  so  with  good 
approximation  we  can  assume  that  the  amplitudes  of  the 
transfer  functions  appearing  in  the  formulation  are  the  same. 
Equation  (9)  can  be  simplified  to 

Cfl„| « |£(S!'(J?))||Af (f )|r_l  |/jJ  (10) 

By  substituting  equation  (10)  in  the  MFP  formulation 
(equation  (8))  we  have: 

p  </ 


where  M{T)  is  a  constant  multiplier  equalling  the  number  of 
CR  terms  in  the  MFP  formulation.  For  an  array  with  N  sensors 


we  have  M (T)  — 
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To  obtain  a  simpler  equation,  let  us  assume  that  the  deviation 
value  due  to  the  mismatch  is  independent  of  the  sensors  p  and 
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Now,  let  us  define  the  MFP  sensitivity.  The  sensitivity 
function  S  is  defined  as 


S  —  S2,. 


(13) 


where 
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The  sensitivity  function  from  equation  (12)  becomes 


S!  — 


9/i 


M(T)\H(Ff{TA)  |£(5r(jF)) 


(14) 


(15) 


To  see  how  the  MFP  sensitivity  changes  with  increasing  the 
order  from  7"  to  T+l  we  obtain 
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The  transfer  functions  norms  represent  the  transmission  loss 
from  the  source  to  the  vertical  array  sensors  that  are  relatively 
small  because  of  the  high  ocean  attenuation.  This  fact  makes 
the  sensitivity  function  to  have  a  potentially  large  value  for 
higher  order  MFP;  however,  in  order  to  calculate  the 
sensitivity  function  we  need  to  know  the  relative  value  of  the 
moments. 

4.  Experimental  results 


We  have  applied  the  2nd,  3rd,  and  41*1  order  cross  relation  based 
MF  processor  to  experimental  ship  data  to  examine  the 
existence  and  effects  of  higher  order  content  for  source 
localization  in  two  different  frequency  bands.  The  ship 
position  from  GPS  data  was  at  a  range  of  3.33km  with  a 
bearing  of  153.27  degrees  to  the  vertical  linear  array  location. 
The  analysis  has  been  carried  out  over  73-133Hz  with 
resolution  2Hz  and  over  150-270Hz  with  resolution  of  4Hz. 
The  replica  or  modeled  fields  used  in  the  analysis  is  calculated 
using  ORCA  [3].  A  towed,  lower  depth,  acoustic  beacon 
emits  tones  out  of  these  bands,  but  some  harmonics  show  in 
our  results  at  the  lower  depth. 


5.  Conclusions 


We  have  introduced  a  cross-relation  (CR)  based  higher  order 
matched  field  (MF)  processing  technique  for  estimating  the 
location  of  a  random  source  (ship)  in  shallow  water,  and  in  the 
process  we  have  found  information  with  regard  to  its  higher 
order  features  which  may  be  useful  for  detection  or 
classification.  It  has  been  verified  that  the  probability  density 
function  (PDF)  of  signals  emitted  from  marine  vessels  have 
higher  order  components. 

Use  of  the  higher  order  MF  processor  can  cancel  the  effect  of 
For  frequency  band  73-133Hz,  the  2nd,  3rd  and  4th  order  MF 
processors  are  shown  in  Figs  3, 4,  and  5  respectively.  (Note: 
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Fig.  3  Ambiguity  surface  for  the  2nd  order  cross-CR  processor 
(73-133Hz) 

'-.(tint*  >1?  siJii  OtWi-OP. 


3B0O  2503  30EE:  5300  3*33  3KB  3R3S1  UN  4309  4490 


Fig.  4  Ambiguity  surface  for  3rd  order  cross-CR  processor  (73- 
133Hz) 


Fig.  6  Ambiguity  surface  for  the  2nd  order  cross-CR  processor 
(150-270Hz) 
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Fig.  7  Ambiguity  surface  for  3rd  order  cross-CR  processor 
(150-270Hz) 
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Fig.  5.  Ambiguity  surface  for  4th  order  cross-CR  processor  (73- 
133Hz) 


Fig.  8  Ambiguity  surface  for  4th  order  cross-CR  processor 
(150-270Hz) 


for  color  figures,  email  the  authors.)  In  these  grayscale 
images  the  darkest  regions  indicate  modes  of  energy,  possible 
targets.  The  2nd  and  3rd  order  MF  processors  show  strong 
peaks  at  both  the  ship  and  towed  beacon  positions;  there  are 
more  sidelobes  around  the  CW  source  position  for  3rd  order. 
The  4,h  order  MF  processor  shows  a  weak  value  at  the  ship 
position.  This  fact  suggests  that  the  4*  order  component  of 
ship  noise  is  not  as  strong  as  its  2nd  and  3rd  order  in  the 
frequency  band  73-133Hz  in  the  MEVA3  trial. 

For  the  higher  frequency  band  of  150-270Hz,  the  2nd,  3rd  and 
4th  order  MF  processors  are  shown  in  Figs  6,  7  and  8, 
respectively.  All  figures  show  a  strong  peak  at  the  ship  and 
position,  but  only  the  higher  order  shows  significant  relative 
energy  from  the  deeper  towed  source.  Again,  we  have  higher 
sidelobe  level  when  MFP’s  order  is  increased.  We  note  that 
the  ship  has  strong  components  for  not  only  second  order  but 
either  white  or  non-white  Gaussian  random  interferences 
since  the  third  and  higher  odd  moments  and  third  and  higher 
order  cumulants  of  Gaussian  random  interferences  are  zero. 

We  have  examined  the  higher  order  content  of  experimental 
ship  and  towed  beacon  data  as  it  effects  the  MF  processor  for 
estimation  of  location  using  different  frequency  bands.  For  the 
frequency  band  73-133Hz  the  2nd  and  3rd  order  MF  processors 
show  strong  peaks  at  both  the  ship  and  towed  beacon 
positions,  but  there  are  also  more  sidelobes  around  the  CW 
source  position  for  3rd  order  than  for  lower  order. 

The  4th  order  MF  processor  at  73-133Hz  shows  a  weak  value 
at  at  both  the  ship  and  towed  beacon  positions.  This  fact 
suggests  that  the  4lh  order  component  of  ship  noise  is  not  as 
strong  as  its  2nd  and  3rd  order  in  the  frequency  band  73-133Hz. 

For  the  higher  frequency  band  of  150-270Hz,  the  2nd,  3rd  and 
4th  order  MF  processors  all  show  strong  peaks  at  the  ship 
position,  however  we  again  have  higher  sidelobes  than  for  the 
lower  order.  This  fact  indicates  that  the  ship  has  a  strong 
component  for  not  only  second  order  but  also  third  and  fourth 
order  in  the  frequency  band  150-270  Hz  in  comparison  with 
other  sources  in  in  the  environment.  We  note  particularly  that 
in  this  band,  the  lower  source  is  more  clearly  identified  by 
the  higher  order  statistics,  as  it  is  relatively  undetected  by  the 
2”*  order  MFP  compared  to  the  primary  target 
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ABSTRACT 


The  bispectral  density  provides  crucial  information  about  non- 
Gaussian  and/or  non-linear  properties  of  stochastic  processes.  In 
practice  however,  bispectral  estimators  are  prone  to  be  inaccurate 
and  statistically  inconsistent.  In  this  paper  we  have  discussed  the 
statistical  properties  of  non-parametric  direct  bispectral  estimators. 
Several  multitaper  based  bispectral  estimators  are  presented,  in¬ 
cluding  a  resently  developed  approach  giving  better  frequency  res¬ 
olution.  We  will  show  that  the  bias  and  variance  of  these  estima¬ 
tors  are  governed  mainly  by  a  quantity  we  call  the  total  bispectral 
window.  Our  conclusion  is  that  classical  bispectral  estimation  us¬ 
ing  biperiodogram  in  combination  of  tapering  and/or  bifrequency 
smoothing  are  outperformed  by  multitaper  based  bispectral  esti¬ 
mators  presented  in  this  paper. 


1.  INTRODUCTION 

It  is  well  known  that  estimates  of  bispectra  and  other  polyspectra 
are  prone  to  be  noisy  and  statistically  inconsistent.  The  problems 
are  particularly  severe  when  small  data  sets  are  available,  or  when 
the  data  comes  from  a  nonstationary  process.  In  particular,  the 
naive  bispectral  estimator  (the  so-called  biperiodogram)  is  anti- 
consistent. 

In  1989,  Thomson  suggested  [1]  a  multitaper  estimator  for 
bispectral  densities,  by  extending  his  established  power-spectral 
estimation  technique.  Thomson  gave  a  useful  approximation  of 
the  variance  of  his  multitaper  bispectral  estimator  assuming  Gaus¬ 
sian  processes.  Numerical  verification  of  Thomson’s  approxima¬ 
tive  variance  expression  and  numerical  examples  for  non-Gaussian 
processes  were  provided  by  the  present  authors  recently  [2], 

In  this  paper,  we  will  discuss  the  statistical  properties  of  non- 
parametric  direct  bispectral  estimators,  with  special  emphasis  on 
multitaper  estimators.  In  particular,  we  show  that  the  bias  of  any 
non-parametric  bispectral  estimator  is  governed  by  a  quantity  we 
call  the  total  bispectral  window ,  and  that  the  variance  may  be  ap¬ 
proximated  by  a  term  depending  on  the  same  quantity. We  will  also 
generalize  the  multitaper  approach  and  present  a  recently  devel¬ 
oped  bispectral  estimator  with  better  bias  properties  for  rapidly 
varying  bispectral  denisities  [3],  We  will  briefly  discuss  leak¬ 
age  effects  in  bispectral  estimation,  and  introduce  the  use  of  data 
adaptive  weight  functions  to  control  these  effects  in  the  multitaper 
approach  [4],  Finally,  we  discuss  the  applicability  of  multitaper 
based  bispectral  estimators. 


2.  SPECTRAL  REPRESENTATION 

In  this  paper  we  will  assume  that  N  samples  are  available,  equally 
spaced  in  time  with  At  =  1,  from  a  real  valued,  zero-mean,  sta¬ 
tionary  and  ergodic  stochastic  process.  Then  there  exist  a  Cramdr 
spectral  representation  [5, 6] 

ft/2 

x[n]  =  /  exp  ( ji  2rr  /  n)  dX  ( / )  (1) 

J- 1/2 

where  z[n]  for  n  =  0, 1, . . .  ,N  —  1  are  the  data  samples  and 
dX(f)  is  the  increment  process  at  frequency  /.  The  relationship 
between  available  data  represented  by  the  standard  Fourier  trans¬ 
formed  data  X(f)  =  Yln=o  exP(—  j2nfn)  and  the  incre¬ 

ment  process  dX(f)  can  be  written  as  [7] 

fl/2 

*(/)=/  b(f  -  f')dX(f).  (2) 

J- 1/2 

Here  £>{})  =  D(f )  exp  [j(N  —  l)rr/]  is  a  phase-shifted  version 
of  the  Dirichlet  kernel  D(f)  =  s'm(Nnf)/ sin(nf).  From  the 
properties  of  the  Dirichlet  kernel,  it  is  easy  to  understand  two  fun¬ 
damental  properties  of  Fourier  based  estimators:  First,  D(f)  is 
zero  for  the  harmonic  frequencies  /  =  i/N  for  i  =  1, 2, . . : ,  N  — 
1.  This  implies  that  for  white  noise,  we  can  obtain  uncorrelated  es¬ 
timates  of  dX(f)  at  any  two  different  harmonic  frequencies.  Sec¬ 
ond,  the  Dirichlet  kernel  has  a  large  sidelobe  level  which  gives 
raise  to  severe  spectral  leakage  effects  [8, 9]. 

3.  CONVENTIONAL  NON-PARAMETRIC  BISPECTRAL 
ESTIMATORS 

Using  the  spectral  representation  for  the  stochastic  process,  the 
integrated  bispectrum  B(f1,  f2)dfidf2  is  defined  by  [1,  6] 

B(fi ,  f2)dfidf2  =  Cum  [dX(h)dX(f2)dX{f3))  (3) 

where  B(f3,f2)  is  the  bispectral  density,  and  fi  +  f2  +  f 3  =  0. 
Using  the  Fourier  transform  X(f)  to  approximate  the  increment 
process  dX(f),  the  resulting  (naive)  bispectral  estimator  is  the  bi¬ 
periodogram  Bper(fi,f2)  given  by 

£per(/i,/2)  =  jjX{fi)X{f2)X*{h  +  h)  (4) 

where  the  asterisk  denotes  complex  conjugation. 

The  statistical  properties  of  the  biperiodogram  are  described 
by  [10,  6].  Using  the  relationship  between  the  Fourier  transform 
and  the  increment  process,  it  is  easy  to  show  that  the  expected 
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value  can  be  written  as  a  two-dimensional  convolution  between 
the  true  bispectrum  of  the  process  -B(/i ,  /2)  and  the  rectangular 
bispectral  kernel  D(fi ,  }2 ) 

t  r1/2 

E  |>er(/1,/2)]  =  j  D(fi  -  f[,h  ~ 

_1/2  (5) 

where  D(f\,f2)  can  be  expressed  by  means  of  the  Dirichlet  kernel 

asD(/i,/2)  =  D(h)D{f2)b*(h  +  h)/N  =  D{h)D(f2) 
D(fi  +  f2)/N.  Asymptotically  ( N  -4  oo)  the  kernel  approaches 
a  two-dimensional  Dirac  delta  function  making  the  biperiodogram 
asymptotically  unbiased.  For  finite  N,  however,  the  rectangular 
bispectral  kernel  implies  a  leakage  in  the  bifrequency  domain. 

Assuming  a  Gaussian  process,  and  that  X(fi),  X(f2)  and 
X(fi  +  f 2)  are  uncorrelated,  the  variance  has  been  approximated 
by  [10, 6] 

Var  |>er(/i,/2)]  *  NS(h)S(f2)S(h  +  /2),  (6) 

where  S(f)  is  the  true  power  spectrum  of  the  process,  and  f\  / 
(0,  ±1/2),  /2  #  (0,  ±1/2)  and  |/i  +  f2\  ±  (0,±l/2).  From 
eq.  (6)  it  is  clear  that  the  biperiodogram  is  anti-consistent,  since 
the  variance  increases  as  the  number  of  data  samples  N  increases. 
This  anti-consistency  is  certainly  not  acceptable  for  an  estimator, 
and  the  raw  biperiodogram  should  therefore  be  avoided  in  general. 

3.1.  Frequency  smoothing 

The  variance  of  the  biperiodogram  can  obviously  be  reduced  by 
frequency  smoothing  in  the  bispectral  domain.  The  covariance  of 
the  biperiodogram  shows  that  different  frequency  pairs  (/* ,  //)  # 
(fm ,  fn )  are  uncorrelated  for  harmonic  frequencies  f  =  i/N\i  = 
0,  ±1, . . . ,  ±N- 1  [10, 6],  Applying  a  discrete  bispectral  smooth¬ 
ing  window  G(fi,f2)  of  the  biperiodogram,  can  therefore  reduce 
the  variance  at  the  cost  of  poorer  frequency  resolution. 

The  so-called  uniform  smoothing  window  [1 1]  has  a  constant 
value  within  a  hexagonal  bifrequency  region  of  support.  The  size 
of  the  hexagon  is  user  specified,  and  is  controlled  by  a  single  in¬ 
teger  parameter  a  descibing  the  frequency  smoothing  bandwidth 
(see  [2]  for  more  details).  To  simplify  the  discussion  of  conven¬ 
tional  non-parametric  estimators,  we  will  restrict  ourselves  to  the 
uniform  smoothing  window  in  the  rest  of  this  paper. 

Using  the  assumption  that  the  bispectral  density  of  the  process 
is  approximately  constant  within  the  smoothing  bandwidth,  the 
uniform  smoothing  does  not  change  the  expectation  value  of  the 
biperiodogram.  Since  the  biperiodogram  at  pairs  of  harmonic  fre¬ 
quencies  are  uncorrelated,  it  is  easy  to  show  that  uniform  smooth¬ 
ing  reduces  the  variance  approximately  by  the  factor  1/C,  the 
number  of  non-zero  points  in  G(fi ,  /2),  for  a  white  Gaussian  pro¬ 
cess. 

3.2.  Tapering 

Tapering  is  the  well  known  solution  for  reducing  spectral  leakage 
in  power  spectral  estimation.  Denoting  the  data  taper  by  u[n]  and 
the  available  data  by  as[n],  the  tapered  data  y[n]  is  obtained  by 
j/[n]  =  z[n]v[n],  for  n  =  0, 1, . . . ,  N  -  1.  The  effect  on  the  ex¬ 
pectation  value,  can  easily  be  seen  using  the  relationship  between 
Fourier  transformed  tapered  data  Y (/)  and  the  true  increment  pro¬ 
cess  dX(f), 

Y(f)  =  /1/2  V(f  -  f')dX(f)  .  (7) 

7-1/2 


Here  the  convolution  kernel  V(f)  is  the  discrete  Fourier  trans¬ 
form  of  the  data  taper  v[n\  With  the  use  of  standard  data  ta¬ 
pers  as  the  Hanning  taper,  the  kernel  in  eq.  (7)  will  be  modified  to 
have  a  broader  mainlobe  and  lower  sidelobe  level  than  the  Dirich¬ 
let  kernel  in  eq.  (2)  [8].  Leakage  is  thus  reduced  at  the  expense 
of  a  poorer  frequency  resolution.  If  the  taper  is  normalized  by 
1,3 [n]  =  N,  a  tapered  biperiodogram  can  be  obtained  us¬ 
ing  Y (/)  instead  of  X (/)  in  eq.  (4). 

The  statistical  properties  of  tapered  biperiodograms  are  closely 
connected  to  those  of  the  biperiodogram  discussed  above.  The  ex¬ 
pected  value  can  be  written  as  a  convolution  between  the  tapered 
bispectral  kernel  V(fi,f2)  and  the  true  bispectrum,  as  in  eq.  (5), 
where  V(/i,/2)  =  V(fi)V(f2)V*{fi  +  }2).  This  means  that 
the  bispectral  leakage  can  be  reduced  because  of  lower  sidelobe 
level  in  V(/i,/2),  but  the  use  of  tapering  also  reduces  the  fre¬ 
quency  resolution  since  the  mainlobe  of  V(fi,f2)  is  wider  than 
the  rectangular  bispectral  kernel. 

3.3.  Tapering  and  frequency  smoothing 

The  use  of  tapering  in  combination  with  frequency  smoothing  in¬ 
troduces  some  properties  that  are  difficult  to  quantify  in  the  re¬ 
sulting  bispectral  estimate.  Using  the  approach  in  [9]  (pp.  243- 
246),  it  is  possible  to  show  that  the  expected  value  of  a  frequency 
smoothed  tapered  biperiodogram  can  be  written  as  a  convolution 
between  the  true  bispectrum  and  the  total  bispectral  window 
W(fi,f2)  given  by 

/1/2 

G(fi  -  f[,f2  -  }'2)V{f[J2)df[d}'2  (8) 

-1/2 

where  G(/i ,  f2)  is  the  bispectral  smoothing  window  and  V (fi,f2) 
is  the  tapered  bispectral  kernel. 

The  variance  properties  for  this  tapered  and  smoothed  estima¬ 
tor,  are  somewhat  complicated.  Use  of  tapers  other  than  the  rectan¬ 
gular,  will  introduce  correlation  between  the  bifrequency  (fi,f2) 
and  its  surroundings  even  if  fi  are  harmonic  frequencies.  This 
implies  less  effective  frequency  smoothing  regarding  variance  re¬ 
duction  [12],  The  total  frequency  smoothing  area  given  by  the 
W(fi,f2)  is,  however,  slightly  broader  with  use  of  other  tapers 
than  the  rectangular.  The  bifrequency  smoothing  effect  together 
with  the  less  effective  use  of  data,  makes  the  variance  of  tapered 
and  smoothed  biperiodogram  larger  when  tapers  other  than  the 
rectangular  is  used. 

4.  OPTIMAL  DATA  TAPERS 

While  conventional  non-parametric  bispectral  estimation  seems  as 
an  ad-hoc  combination  of  tapering  and  smoothing,  the  multitaper 
approach  is  a  result  of  a  strict  optimization  criterion.  Maximizing 
the  spectral  concentration 

S% 

where  V (/)  is  the  Fourier  transform  of  the  taper  v\n]  and  fs  is  a 
chosen  bandwidth,  leads  to  a  set  of  orthonormal  data  tapers  known 
as  the  Slepian  tapers,  or  Discrete  Prolate  Spheroidal  Sequences 
(DPSS).  It  is  easy  to  show  that  the  maximization  of  A  in  eq.  (9) 
leads  to  an  Nth-order  eigenvalue  problem  in  the  time  domain  [7] 

Av  =  Av,  (10) 
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DPSS  tapers 


PM  tapers,  G=0 


PM  tapers,  G=30 


where  v  =  [n[0]u[l] . . .  «[7V  — 1]]T  and  the  matrix  A  has  elements 
Ani,n2  =  sin[27r/B(m  —  ri2)]/n(ni  -  rc2).  The  eigenvectors  of 
this  eigenvalue  problem  are  the  DPSS  tapers,  which  we  denote  as 
and  order  by  decreasing  corresponding  spectral  concentra¬ 
tion  A*.  Note  that  the  tapers  are  orthonormal  and  that  their  Fourier 
transforms  are  doubly  orthogonal  [7], 

Slepian  showed  that  the  spectral  concentration  At  is  close  to 
unity  for  tapers  vk  [n]  for  orders  k  =  0, 1, . . . ,  2 NfB  -  1,  where 
K  =  2 N  fB  is  known  as  the  Shannon  number,  and  that  it  falls 
rapidly  towards  zero  for  orders  beyond  K - 1 .  The  optimal  number 
of  DPSS  to  be  used  in  a  multitaper  approach  is  therefore  K  = 
2 N fs,  and  the  resulting  total  spectral  window 


W(f)  =  J2xk\Vk(f)f/J^Xk  (11) 

k=0  k=o 

will  approximate  an  ideal  band  limited  filter  [9],  The  highest  or¬ 
der  DPSS  taper  used  has  the  lowest  spectral  concentration,  and  is 
therefore  the  taper  with  the  highest  sidelobe  level.  Reducing  the 
number  K  of  tapers  in  the  multitaper  approach  results  in  lower 
sidelobes  in  the  total  spectral  window. 

Assuming  a  predefined  peaked  spectrum  prototype  shape  &  (/), 
it  is  possible  to  obtain  a  set  of  orthonormal  peak  matched  (PM)  ta¬ 
pers  that  have  better  frequency  resolution  than  the  ideal  flat  smooth¬ 
ing  in  DPSS  tapers  from  eq.  (10).  With  slightly  different  notation 
than  in  [13],  we  will  assume  a  logarithmic  triangular  spectral  peak 
with  0  dB  in  /  =  0,  C  dB  in  /  =  ±/b  and  — oo  dB  outside  the 
half-bandwidth  fB  as  in  the  case  of  DPSS.  The  PM  tapers  are  the 
solutions  of  the  eigenvalue  problem 

Pv  =  Av,  (12) 

with  v*[n]  as  eigenvectors  and  corresponding  eigenvalues  At.  As 
for  the  DPSS  case,  we  will  order  the  eigenvalues  in  decreasing  or¬ 
der  and  use  the  K  =  2 N  fB  lowest  order  PM  tapers  in  our  multi¬ 
taper  approach.  The  Toeplitz  covariance  matrix  P  has  the  elements 
-Pni.nj  =  rz [nj  —  n2]  *  sin(27r/B[n1  —  ra2])/7r[m  —  n2],  where 
rx[n]  is  the  covariance  sequence  corresponding  to  &(/)  and  * 
denotes  a  convolution.  The  resulting  spectral  window  using  PM 
tapers  will  approximate  the  predefined  spectrum  prototype. 

The  Fourier  transform  Vk(f)  of  the  PM  tapers  has  approxi¬ 
mately  the  same  sidelobe  level  for  any  order  k,  so  we  cannot  re¬ 
duce  the  sidelobe  in  the  total  spectral  window  by  using  fewer  ta¬ 
pers  as  in  the  DPSS  taper  case.  To  decrease  the  effect  of  leakage, 
we  therefore  have  to  introduce  a  frequency  selective  penalty  spec¬ 
trum  Sg(f)  in  the  eigenvalue  problem  [13] 

Pv  =  AMv.  (13) 

Here,  the  matrix  M  has  a  Toeplitz  structure  with  elements 
=  rg [r>i  —  n2],  where  rg[n]  is  the  covariance  sequence 
corresponding  to  the  penalty  spectrum  Sg(f).  The  penalty  spec¬ 
trum  has  a  flat  response  of  0  dB  inside  the  chosen  bandwidth  fB, 
and  a  level  of  G  dB  outside.  The  resulting  PM  tapers  from  the  gen¬ 
eralized  eigenvalue  problem  in  eq.  (13),  have  G  dB  lower  sidelobe 
level  at  the  cost  of  even  faster  decreasing  eigenvalues  and  thereby 
less  effective  number  of  tapers  used  in  the  total  spectral  window. 
Note  that  by  choosing  G  =  0,  the  generalized  eigenvalue  problem 
in  eq.  (13)  is  reduced  to  eq.  (12),  so  ordinary  PM  tapers  are  actually 
PM  tapers  without  sidelobe  suppression.  Note  also  that  the  DPSS 
tapers  can  be  obtained  from  eq.  (13)  by  choosing  C  =  G  =  0 
[13]. 
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Figure  1 :  Total  spectral  window  for  three  different  sets  of  orthog¬ 
onal  tapers  in  the  case  of  N  =  64,  fB  =  2/N.  Left:  DPSS  tapers; 
Middle:  PM  tapers  with  C  =  -20  and  G  =  0;  Right:  PM  tapers 
with  C  =  -20  and  G  =  30. 


To  summarize,  the  use  of  tapers  that  are  solutions  of  the  gener¬ 
alized  eigenvalue  problem  in  eq.  (13)  gives  a  controlled  frequency 
smoothing  effect  in  the  spectral  domain.  In  Fig.  1  we  show  the 
total  spectral  window  for  three  different  sets  of  tapers  for  the  case 
N  =  64  and  fB  =  2/N.  The  left  panel  shows  the  DPSS  tapers, 
the  middle  panel  shows  the  PM  tapers  with  C  =  —20  dB  and 
G  =  0,  and  the  right  panel  shows  the  PM  tapers  with  C  =  —20 
dB  and  G  =  30  dB.  The  total  spectral  window  is  plottet  as  a  func¬ 
tion  of  number  of  tapers  used,  increasing  from  K  —  1  to  K  =  4 
from  top  to  bottom.  The  corresponding  eigenvalues  for  the  DPSS 
taper  and  PM  tapers  with  and  without  sidelobe  suppression  are 
shown  in  Table  1.  The  variance  reduction  can  be  connected  to 
the  effective  number  of  orthonormal  tapers  actually  in  use,  and  is 
therefore  closely  connected  to  the  corresponding  set  of  eigenval¬ 
ues.  The  DPSS  tapers  have  the  best  variance  properties  since  all 
eigenvalues  are  close  to  unity,  while  the  peak  matched  tapers  offer 
a  frequency  selective  multitaper  approach  at  the  expence  of  vari¬ 
ance  reduction. 


Order  k 

0 

1 

2 

3 

DPSS 

0.9999 

0.9976 

0.9596 

0.7220 

PM,  G  =  0 

0.5363 

0.2057 

0.0792 

0.0297 

PM,  G  =  30 

0.4218 

0.0483 

0.0020 

0.0001 

Table  1 :  Eigenvalues  A*  for  DPSS  tapers  and  PM  tapers  with  and 
without  sidelobe  suppression.  The  four  lowest  order  eigenvalues 
are  shown  for  the  case  N  =  64,  f B  =  2/N  and  C  =  -20. 


4.1.  Adaptive  weight  functions 

To  reduce  spectral  leakage  in  the  multitaper  power  spectral  esti¬ 
mator,  Thomson  introduced  weight  functions  <&(/)  for  each  order 
of  taper  to  obtain  improved  estimates  of  dXk(f  )  [7].  Using  a  ro¬ 
bust  adaptive  approach,  DPSS  tapers  with  high  sidelobe  level  are 
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down-weighted  in  frequency  regions  where  leakage  can  influence 
the  estimate.  Since  each  taper  is  considered  individually  for  leak¬ 
age,  the  adaptive  approach  also  eliminates  the  need  for  choosing 
the  optimal  number  of  tapers  K  <  2N fs  in  the  multitaper  es¬ 
timate.  More  details  conseming  about  the  adaptive  approach  for 
determining  these  weight  function  can  be  found  in  [7, 4], 

The  use  of  data  adaptive  weight  functions  dk(f)  in  bispectral 
estimation  have  been  thoroughly  discussed  in  [4],  The  effect  of 
leakage  in  bispectral  estimation  are  more  complicated  than  for  the 
power  spectral  estimation  case.  In  brief,  leakage  can  stongly  influ¬ 
ence  the  estimator  variance  while  the  bias  can  be  negligible. 

To  detect  frequencies  where  leakage  can  influence  the  estima¬ 
tion  of  dX(f),  the  adaptive  approach  requires  tapers  where  leak¬ 
age  is  not  present.  The  lowest  order  of  DPSS  tapers  has  the  lowest 
possible  sidelobe  level  for  the  chosen  bandwidth,  providing  leak¬ 
age  free  estimates  of  the  true  increment  process.  In  the  DPSS  ta¬ 
per  case,  these  weight  functions  therefore  effectively  reduces  the 
leakage.  For  the  peak  matched  tapers  the  adaptive  approach  is  use¬ 
less  since  tapers  for  all  order  has  approximately  the  same  sidelobe 
level. 

5.  MULTITAPER  BISPECTRAL  ESTIMATORS 

Let  a:[ra]  be  the  data  available  for  n  =  0, 1, . . . ,  N  —  1.  Using  any 
orthogonal  set  of  basis  functions  vk  [w]  and  corresponding  eigen¬ 
value  \k  for  k  =  0, 1, . . . ,  K—  1,  we  obtain  a  set  of  tapered  data  as 
yk[ri\  =  x[n]vk[n]  with  corresponding  Fourier  transform  Yk(f). 

The  multitaper  approach  has  been  applied  to  bispectral  esti¬ 
mators  in  [7, 1,  2, 4],  A  general  approach  for  multitaper  bispectral 
estimation  (MBE)  can  be  written  as  a  weighted  sum  of  all  combi¬ 
nations  of  individually  tapered  biperiodograms 

i  K~1 

B(/i,/2)  =  7±-  Y  »»)*(», /a)  (I4) 

where  the  tapered  biperiodogram  of  order  (k,  l,  m)  is 

=  n(/i)yt(/2)K(/i  +  h)  us) 

and  the  three-dimensional  weighting  function  given  by 

_ N-l 

Q(k,  l,m)  =  s/\\ Y  (16) 

n=0 

The  normalization  constant  Uz  is  defined  by 

A'-l 

U3=  Y  Q2(M,™)/VA*A,Am  (17) 

k,/,m=0 

to  ensure  that  the  bispectral  estimator  is  unbiased  for  white  noise. 

The  use  of  data  adaptive  weight  functions  c&(/)  in  bispectral 
estimation  modifies  the  three-dimensional  weight  function  in  eq. 
(16)  to  be  bifrequency  selective 

Q(k,l,m)(fl,f2)  =  Q(k,l,m)dk{fl)dl(f2)dm(fl  +  /2).  (18) 

To  obtain  an  unbiased  bispectral  estimate,  the  new  normalization 
constant  U3  also  depends  on  bifrequencies  {f\,h) 

K-l 

UUfl,f2)=  Y  Q(M,m)4(/lW/2)<U/l+/2)-  (19) 

The  modification  caused  by  dk{f)  in  eq.  (18)  and  eq.  (19)  is  suffi¬ 
cient  to  make  the  MBE  in  eq.  (14)  resistant  against  leakage. 


5.1.  Statistical  properties 

We  have  examined  the  statistical  properties  of  the  MBE  based  es¬ 
timators  in  great  detail.  In  the  following,  we  will  discuss  our  find¬ 
ings  in  some  detail. 

The  expectation  value  of  the  general  multitaper  bispectral  in 
eq.  (14)  can  be  shown  to  be 

r  ,  /-1/2 

E  [B(f1,f2)\  =  j  W(f i  -  fl,f2  -  fl)B(fl,fl)dfldfl 

(20) 

where  the  total  bispectral  window  [2]  is  given  by 
1  K~1 

W(h,f2)=r±-  Y  Q(k,l,m)Wk}l,m(h,h)  (21) 

3  k,l,m=0 

and  the  bispectral  window  of  order  (k,  l,  m)  is  given  by 

Wk,,,m(fuf2)  =  VUfi)Vl'(f2)K(f1  +  /2)]*  (22) 

Assuming  Gaussian  data  and  distinct  frequencies  fi,  f2  and 
/i  +  fi,  the  smoothing  effect  of  the  true  bispectral  density  in  the 
MBE  leads  to  a  variance  decrease.  Only  considering  this  smooth¬ 
ing  effect,  the  variance  of  the  MBE  can  be  approximated  by 

var  [b(/i,/2)]  ~  (23) 

W2(f{  -  fufi  -  /Ovar  {B^VUIt)}  dfkdf2. 

Here  var{Bper(f{ ,  /2)}  =  JVS'(/1)S’(/2)^(/1+/2)  is  the  asymp¬ 
totic  variance  of  the  biperiodogram  [10].  This  implies  that  the 
bispectral  estimate  is  consistent  for  fixed  fa,  since  asympotically 
var[B(/i,  /2)]  =  0.  For  conventional  bispectral  estimation,  the  ta¬ 
pering  in  combination  with  bifrequency  smoothing  implies  a  vari¬ 
ance  increase  compared  to  only  smoothing  of  the  biperiodogram 
since  tapering  implies  less  effective  use  of  data.  For  multitaper¬ 
ing  the  effective  use  of  data  is  better  than  for  a  single  taper  [9],  so 
the  difference  between  the  approximation  in  eq.  (23)  and  the  true 
variance  is  small. 

Statistical  properties  of  the  adaptive  MBE  are  hard  to  obtain 
in  general  since  the  calculation  of  weight  functions  depends  on 
the  process  in  study.  Processes  with  small  dynamical  range  in  the 
true  increment  process  dX(f)  have  no  long  range  leakage,  and  the 
adaptive  MBE  are  therefore  close  to  the  MBE  with  all  K  =  2 N fs 
tapers  used.  For  processes  with  large  dynamical  range  in  dX(f), 
the  total  bispectral  window  V(/i,/2)  must  be  redefined  to  also 
depend  on  the  actual  bifrequency  (  fi,f2)  under  study, 

■sr-i 

V(fk,f2,fif2)=  Y  Q(k,l,m)(fLf2)Vik,t,m)(fuf2) 

k  ,1,171=0 

(24) 

where  V(j:,(,m)(/i,/2)  =  Vk(f1)V,(f2)Vm(f1  +  f2).  For  bi¬ 
spectral  regions  where  the  magnitude  is  low,  the  weight  function 
will  down-weight  the  tapers  with  high  sidelobe  levels  so  leakage  is 
avoided.  This  down-weighting  of  tapers  will  reduce  the  variance 
of  the  adaptive  MBE  in  lower  bispectral  parts,  since  leakage  from 
these  tapers  contributes  to  the  variance. 

Extensive  Monte  Carlo  simulations  of  the  non-parametric  es¬ 
timators  discussed  in  this  paper  have  verified  our  results  on  the 
statistical  properties  [2,  3, 4]. 
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Figure  2:  Total  bispectral  window  for  three  classes  of  bispectral  es¬ 
timators.  Upper  left:  Uniformly  smoothed  biperiodogram.  Upper 
right:  Hanning  tapered  and  uniformly  smoothed  biperiodogram. 
Middle  left:  MBE  using  K  =  2 N/b  DPSS  tapers.  Middle  right: 
MBE  using  K  =  2 N/b  —  3  DPSS  tapers.  Lower  left:  MBE  using 
peak  matched  tapers.  Lower  right:  MBE  using  peak  matched  ta¬ 
pers  with  suppressed  sidelobes.  All  bispectral  windows  are  for  the 
case:  N  =  64,  N/b  =  a  =  4,  C  =  —  20  dB  and  G  =  30  dB. 


5.2.  Total  bispectral  windows 

The  total  bispectral  windows  are  plottet  in  Fig.  2  for  some  of  the 
non-parametric  bispectral  estimators  discussed  in  this  paper.  These 
examples  have  approximately  the  same  hexagonal  region  of  sup¬ 
port. 

Use  of  Hanning  taper  in  combination  of  uniform  smoothing 
(upper  right)  will  lower  the  sidelobe  level  significantly  compared 
to  the  use  of  a  rectangular  taper  (upper  left),  but  the  support  in  the 
total  bispectral  window  also  is  rounded  and  slightly  wider. 

The  MBE  using  K  =  2 N  fs  DPSS  tapers  (middle  left)  have 
approximately  the  same  sidelobe  level  as  the  biperiodogram,  but 
the  “edges”  going  out  from  support  are  lower.  Using  fewer  DPSS 
tapers  in  the  MBE  (middle  right)  will  lower  the  sidelobe  level  with¬ 
out  destroying  the  flat  support. 

The  “pyramidal”  support  in  the  peak  matched  taper  case  with 
(lower  left)  and  without  (lower  right)  sidelobe  suppression  clearly 
differs  from  the  other  total  bispectral  windows.  The  linear  decay 
(in  dB)  to  —20  dB  at  the  edge  of  hexagonal  support,  means  that 
these  tapers  provide  better  frequency  resolution.  This  is  achieved, 
as  usual  in  spectral  estimation,  at  the  cost  of  higher  variance. 

6.  CONCLUSION 

While  the  expected  value  and  bias  are  completly  discribed  by  the 
total  bispectral  window  alone,  we  also  have  to  consider  the  effec¬ 
tive  use  of  data  to  descibe  the  variance  properties. 

The  statistical  performance  for  a  specific  estimator  will 
of  course  depend  on  the  particular  process  under  study.  For  slowly 
varying  bispectra,  Thomson’s  original  multitaper  approach  using 


DPSS  tapers  is  the  best  choise.  If  the  process  has  large  dynam¬ 
ical  range  in  the  true  increment  process,  we  have  to  reduce  the 
leakage  to  get  satisfactory  results.  For  non-parametric  estima¬ 
tion  this  means  data  tapering,  where  the  use  of  frequency  selective 
weight  functions  in  combination  of  DPSS  tapers  seem  to  be  an  ob¬ 
vious  choise.  Conventional  techniques  with  tapered  and  smoothed 
biperiodogram  implies  less  effective  use  of  data,  and  thus  have 
higher  variance. 

The  use  of  peak  matched  tapers  provides  a  good  combination 
of  low  bias  and  variance  reduction  in  the  MBE  for  rapidly  varying 
bispectra.  In  cases  of  large  dynamical  range  in  the  true  increment 
process,  we  conclude  that  peak  matched  tapers  with  suppressed 
sidelobes  should  be  applied. 
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ABSTRACT 

We  propose  a  modification  to  the  Constant  Modulus  Crite¬ 
rion  for  real  valued  sources  processed  with  a  complex  valued 
receiver.  Our  modification  is  called  Single-Axis  Constant 
Modulus  Criterion  (SA-CM)  because  it  operates  solely  on 
the  real  component  of  the  complex  equalizer  output.  We 
show  that  under  idealized  conditions,  a  finite  length,  baud- 
spaced,  complex  valued  equalizer  minimizing  the  SA-CM 
criterion  admits  only  desirable  global  minima  settings  that 
are  ISI-free.  A  single-axis  receiver  architecture  is  compared 
to  other  receiver  architectures  for  real  valued  sources  and 
staggered  modulation  schemes.  Simulation  examples  using 
vestigal  sideband  (VSB)  signaling  verify  our  methods. 

1.  INTRODUCTION 

Modern  digital  receivers  often  rely  on  blind  equalization 
techniques  to  mitigate  unknown  channel  distortions.  Blind 
methods  are  desirable  because  they  do  not  rely  on  a  periodic 
transmission  of  a  training  sequence,  thus  increasing  data 
throughput  and  allowing  for  equalizer  adaptation  at  every 
symbol  instance.  The  Constant  Modulus  Algorithm  (CMA) 
is  a  popular  blind  equalization  technique  used  in  high  data- 
rate  applications  due  to  its  robustness  under  practical  sig¬ 
naling  conditions  [4]. 

It  is  often  the  case  that  a  digital  receiver  uses  complex 
valued  signal  processing  even  though  the  data  source  is  real 
valued  or  encodes  original  information  into  only  one  dimen¬ 
sion.  Complex  signal  processing  may  be  required  by  the  re¬ 
ceiver  since  synchronization  and  equalization  functions  op¬ 
erate  on  passband  data  that  is  not  precisely  downconverted 
to  baseband.  Most  treatments  of  CMA  assume  either  real 
valued  signal  processing  for  a  real  valued  source  (such  as 
PAM)  or  complex  valued  signal  processing  for  complex  val¬ 
ued  sources  (such  as  QAM)  -  [7]  is  an  exception.  In  [7],  Pa- 
padias  shows  that  because  a  BPSK  source  is  not  circularly 
symmetric,  i.e.  E{s2 }  7^  0,  the  Constant  Modulus  (CM) 
cost  function  admits  global  minima  settings  that  result  in 
a  closed-eye  combined  channel-equalizer  response. 

We  present  a  modification  to  the  CM  criterion  appro¬ 
priate  for  real  valued  sources  that  are  processed  by  com¬ 
plex  valued  receivers,  in  which  equalizer  coefficients  are 
updated  using  real-part  extraction  of  the  equalizer  filter 
result.  We  show  that  a  finite  length,  baud-spaced,  com¬ 
plex  valued  equalizer  minimizing  the  SA-CM  criterion  ad¬ 


mits  only  minima  that  are  global  and  result  in  open-eye 
settings,  thus  excluding  the  undesirable  settings  described 
in  [7].  As  fractionally-spaced  equalizers  exploit  temporal 
diversity,  single-axis  equalizers  exploit  phase  diversity  in 
complex  valued  channels. 

Tu  [10]  applies  a  similar  concept  for  staggered  modu¬ 
lation  formats  (such  as  staggered-QAM  and  vestigial  side¬ 
band  modulation)  using  minimum  mean  square  error  (MMSE) 
equalization.  In  staggered  modulation,  information  is  en¬ 
coded  independently  onto  in-phase  (I)  and  quadrature-phase 
(Q)  carriers,  with  the  carriers  staggered  in  time  by  typi¬ 
cally  half  the  symbol  period  relative  to  standard  QAM.  Ref¬ 
erence  [10]  shows  that  alternatively  minimizing  the  Mean 
Square  Error  (MSE)  over  I  and  Q  samples  results  in  a  lower 
MSE  performance  than  minimizing  the  complex  valued  er¬ 
ror  term.  Through  example,  we  show  that  SA-CMA  is  ad¬ 
ditionally  applicable  to  staggered  modulation  formats. 

The  next  section  describes  a  communication  model  us¬ 
ing  real  valued  data  sources  with  complex  valued  signal  pro¬ 
cessing  and  provides  motivation  for  employing  a  complex 
valued  receiver.  Section  3  introduces  the  SA-CM  criterion 
and  shows  its  perfect  symbol  recovery  properties.  Section 
4  provides  simulation  examples  and  applies  SA-CMA  for 
staggered  modulations.  Section  5  provides  concluding  re¬ 
marks.  Section  6  provides  directions  of  future  work. 


2.  RECEIVER  ARCHITECTURES  FOR  REAL 
VALUED  SOURCES 

We  begin  with  a  communication  model  of  a  real  valued,  sub- 
Gaussian  (i.e.  E{s4(n)}  <  3E{s2(n)}),  zero-mean,  i.i.d. 
source  {s(n)},n  €  Z,  s(n)  e  R.  This  real  valued  source  is 
filtered  through  a  complex  valued  FIR  channel  described  in 
matrix  notation  as 
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Receiver  A 
r(n) 


Figure  1:  Receiver  A:  real  valued  equalizer  f(z)  updated 
with  real  valued  estimates  y(n).  Receiver  B:  complex  valued 
equalizer  f(z)  updated  with  complex  valued  estimates  y(n). 
Receiver  C:  complex  valued  equalizer  f(z)  updated  with  real 
valued  estimates  y(n). 


with  a  €  C.  We  present  for  examination  three  receiver  ar¬ 
chitectures,  shown  in  Figure  1.  Fat  arrows  denote  complex 
valued  signals;  thin  arrows  denote  real  valued  signals. 

Each  receiver  consists  of  carrier  phase  correction  of  com¬ 
plex  valued  received  samples  followed  by  an  adaptive  FIR 
equalizer.  Receiver  A  has  a  real  valued  equalizer  operating 
on  real  valued  data.  Receiver  B  shows  a  complex  valued 
equalizer  that  is  updated  according  to  complex  valued  esti¬ 
mates  of  the  source  symbols.  As  noted  in  [7],  this  receiver 
can  result  in  a  closed-eye  combined  channel-equalizer  set¬ 
ting  when  the  CM  criterion  is  minimized. 

Receiver  C  reflects  the  architecture  we  have  termed  as 
single-axis  equalization,  which  employs  a  baseband,  com¬ 
plex  valued,  baud-spaced  FIR  equalizer  f  =  (/0, . . .  ,  /w/)T, 
/i  €  C  to  generate  real  valued  estimates  y(n)  of  the  source 
symbols.  These  estimates  are  given  by 

y(n)  -  Re  |rr(n)f| 

where  the  received  signal  is  given  by  r(n)  =  CTs(n)  +  w(n), 
and  s (n)  =  (s(n), ...  ,s{n-  Nh))T,  with  Nh  =  Nc  +  Nf, 
and  0  is  assumed  to  be  zero.  (The  effect  of  a  non-zero 
phase  offset,  6 ,  will  be  discussed  in  Section  4.2.)  Note  that 
w (n)  =  ( w(n ), . . .  ,  w{n  -  Nf))T,  where  w(n )  is  an  additive 
white  Gaussian  noise  process. 

Because  Im  {s(n)}  =  0,  we  can  rewrite  the  single-axis 


equalizer  output  as 

V(n)  =  Re  |rT(n)|  Re  {f}  —  Im  |rT(n)|  Im  {f} 

=  sT(n)(Re{C}Re{f} -Im{C}Im{f})  + 

Re  |wT(n)f| 

In  this  form,  the  single-axis  equalizer  is  a  linear,  real  valued, 
multi-channel  receiver  with  sub-channels  Re  (C)  ,  — Im  {C} 
and  corresponding  sub-filters  Re  {f}  ,  Im  (f).  This  channel- 
equalizer  model  is  mathematically  equivalent  to  the  over¬ 
sampled  channel-equalizer  system  in  [2]  or  the  antenna- 
array  scheme  proposed  in  [8].  Single-axis  equalization  ex¬ 
ploits  the  channel  phase  diversity  inherent  in  complex  val¬ 
ued  communication  models  using  real  valued  sources.  In 
the  next  section,  we  apply  the  properties  of  multi-rate  and 
multi-channel  systems  to  single-axis  equalization  and  show 
the  globally  convergent  behavior  of  SA-CMA. 


3.  SINGLE- AXIS  CM  CRITERION 


We  now  show  that  a  baud-spaced,  finite  length  equalizer 
employing  an  adaptation  strategy  based  on  the  SA-CM  cri¬ 
terion  can  achieve  perfect  symbol  recovery.  To  simplify  no¬ 
tation,  we  use  superscript  notation  (/)  and  (Q}  to  indicate 
real  and  imaginary  components,  respectively.  Representing 
the  channel  with  a  matrix  that  isolates  real  and  imaginary 
sub-channels,  we  have 


In  the  absence  of  noise,  the  equalizer  output  can  be  written 
as  y(n)  =  sT(n)Cf,  where 


f-(A>)  f(I)  fU)  AQ)  AQ)  AQ K 

Wo  )  J 1  t  ■  ■  ■  l  J  N  f  ’  J  0  i/l  >  ’  '  -  1  Jn  I  ) 


(O  AQ)  AQ) 


The  equalizer  coefficients  are  obtained  by  minimizing  the 
SA-CM  criterion 


Tsa-cm(f)  =  E{(y2(n)  -  y)2},  7  = 

Equalizer  coeficients  are  thus  updated  according  to 
f(n  +  1)  =  f (n)  -  fxf(y2(n)  -  7 )y(n) 


where  r  =  r ^  *.0  _ AQ)  _ AQ)  AQ)\t 

wueic  1  V '  o  1  '1  i  •  •  •  1  ~Nf  >  ro  >  ri  .  rNf  )  ■ 

References  [2]  for  a  real  valued  source  and  [6]  for  a  com¬ 
plex  valued  source,  show  that  under  a  certain  set  of  condi¬ 


tions,  the  CM  criterion  for  multi-channel  systems  exhibits 
only  global  minima,  and  that  these  minima  achieve  perfect 
equalization  (i.e.  the  combined  channel-equalizer  impulse 
response  is  a  pure  delay  within  a  phase  shift).  The  per¬ 
fect  equalizability  conditions  for  a  real  valued  source  and 


channel-equalizer  are  rewritten  .here. 
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(Cl)  No  additive  channel  noise,  (i.e.  w(n)  =  0 ,Vn) 

(C2)  Full  row-rank  channel  matrix  C.  This  necessitates 
the  absence  of  zeros  common  to  all  sub-channel  poly¬ 
nomials  and  sufficient  filter  length. 

(C3)  Zero-mean,  independent,  and  identically  distributed 
source. 

(C4)  Sub-Gaussian  source  (i.e.  E{s4(n)}  <  3E{s2(n)}  in 
the  real  valued  case). 

Conditions  (Cl),  (C3),  (C4)  are  satisfied  by  our  communi¬ 
cation  model.  Condition  (C2)  requires  that  I  and  Q  sub¬ 
channels  are  coprime,  and  that  Nf  >  Nc  —  1.  Assuming 
these  conditions  are  satisfied,  the  SA-CM  cost  surface  ex¬ 
hibits  the  same  global  minima  as  an  equivalent  real  valued 
dual-channel  CM  criterion. 

Our  communication  system  model  with  SA-CM  crite¬ 
rion  satifies  conditions  (Cl)  to  (C4),  so  that  the  single-axis 
CM  criterion  admits  only  desirable  global  minima.  Hence, 
the  equalizability  condition  for  complex  valued  communica¬ 
tion  models,  i.e.  E{s2(ti)}  #  0,  does  not  apply  to  SA-CMA, 
and  the  undesirable  equalizer  settings  proposed  by  Papadias 
[7]  are  not  admitted  due  to  real-part  extraction.  Note  that 
SA-CM  criterion  is  globally  convergent  with  a  finite  length, 
baud-spaced  equalizer  by  exploiting  the  phase  diversity  in 
the  complex  valued  channel.  This  result  is  analogous  to 
the  global  convergence  result  of  the  CM  criterion  in  [2]  and 
[6]  using  a  finite  length  fractionally-spaced  equalizer  which 
exploits  temporal  diversity. 

4.  EXAMPLES 
4.1.  Two-tap  channel 

Consider  transmission  of  a  real  valued  source  over  a  two- 
tap,  complex  valued  channel  c(z)  =  co  +  ciz-1 .  The  output 
of  our  single-axis  equalizer,  using  a  single  complex  valued 
scalar  filter  fo  is  given  by 

y(n)  =  Re{/o(c0  +  ci«_1)s(n)} 

(cS,)/r)-cS9)fwH-i) 

It  is  possible  to  design  fo  to  recover  a  delayed  version  of  the 
source  y(n )  =  s(n  -  6),  6  =  0, 1,  by  solving  the  following 
set  of  equations 


Figure  2  shows  the  SA-CM  cost  function  for  a  BSPK 
source,  i.e.  s(n)  €  {±1}  and  c(z)  =  (0.3  +  0.4j)  4-  (—0.1  + 
0.2 j)z~l  in  (I,Q)-parameter  space.  Notice  that  the  SA- 
CM  cost  function  in  this  example  has  four  global  minima: 
fo  =  ±(2  -  j),  yielding  y(n )  =  ±s(n)  and  fo  =  ±(-4  -  3 j) 
yielding  y(n)  =  ±s(n  —  1). 

4.2.  Carrier  Phase  Offset  Tolerance 

Practical  demodulators  downconvert  passband  signals  to 
near  baseband  signals  using  carrier  recovery  circuitry.  How¬ 
ever,  this  downconversion  process  is  rarely  exact  and  results 


SA-CM  Cost  Surface 


Figure  2:  SA-CM  cost  function  for  example  in  Section  4.1. 

in  small  carrier  frequency  and  arbitrary  phase  offsets.  We 
study  the  effect  of  static  phase  offsets  on  the  MMSE  per¬ 
formance  of  the  three  receivers  in  Figure  1. 

For  purposes  of  comparison,  we  hold  the  amount  of 
hardware  required  to  implement  them  constant.  Since  Re¬ 
ceiver  A  is  a  real  equalizer  operating  on  real  data,  there 
is  only  one  multiply  required  per  tap.  For  Receiver  B,  we 
have  a  complex  equalizer  operating  on  complex  data  which 
will  require  four  multiplies  per  tap.  Receiver  C  uses  a  com¬ 
plex  equalizer  on  real  data,  requiring  two  multiplies  per  tap. 
Hence,  for  a  given  level  of  hardware  complexity,  Receiver  A 
allows  the  longest  equalizer,  and  Receiver  B  the  shortest.  In 
the  examples  to  follow,  we  choose  filter  lengths  consistent 
with  this  constraint. 

The  MMSE  performance  of  receiver  A  is  a  function  of 
carrier  phase  error,  9.  This  dependence  can  be  seen  in  the 
example  of  Figure  3,  where  the  channel  impulse  response 
coefficients  are 

(-0.25  -  0.14.7,  -0.3477  -  l.lOj, 

—0.41  +  0.31.7,  —0.33  +  1.18.7,  0.09  +  1.17 j) 

and  20dB  SNR  white  Gaussian  noise  is  considered.  Receiver 
A  outperforms  receivers  B  and  C  for  some  phase  offsets 
9,  possibly  due  to  its  longer  filter  length.  However,  the 
MMSE  performance  of  receivers  B  and  C  is  independent  of 
6  since  they  employ  equalizers  with  complex  coefficients.  As 
indicated  above,  Receiver  C  can  have  twice  as  many  taps  as 
receiver  B  for  a  given  hardware  complexity.  Furthermore, 
as  noted  in  [10],  the  MSE  criterion  of  receiver  C  is  less 
restrictive  than  that  of  receiver  B,  since  only  the  real  part 
of  the  mean  squared  recovery  error  is  minimized.  Hence, 
receiver  C  has  the  lowest  MMSE  performace  of  the  three 
receivers  for  nearly  all  phases. 

Note  that  the  real-part  extraction  used  in  the  SA-CM 
criterion  relies  on  precise  carrier  frequency  offset  estimation. 
Unfortunately,  the  decoupling  of  equalization  and  carrier 
frequency  recovery  provided  by  CMA  for  complex  sources 
[9]  is  lost  in  the  SA-CM  criterion.  In  this  case,  when  carrier 
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-  =  16-tap  receiver  A,  x  =  4-tap  receiver  B,  *  =  6-tap  receiver  C 


Figure  3:  MMSE  performance  of  receivers  A,  B  and  C  with 
respect  to  carrier  phase  offset. 


frequency  recovery  is  imprecise,  convergence  of  SA-CMA  is 
not  guaranteed. 


4.3.  Vestigial  Sideband  Modulation 

Vestigial  Sideband  Modulation  (VSB)  has  a  long  history  in 
analog  communications  and  is  particularly  relevant  to  mod¬ 
ern  digital  communications  because  the  Advanced  Televi¬ 
sion  Systems  Committee  has  adopted  and  upheld  VSB  as 
the  modulation  for  high  definition  television  (HDTV)  in 
the  United  States.  Reference  [1]  concludes  that  for  VSB,  a 
complex  valued  equalizer  is  unnecessary  on  the  basis  that 
only  the  in-phase  carrier  (I)  is  modulated  with  unique  data. 
However,  a  real  valued  equalizer  solution  similar  to  receiver 
A  will  not  take  advantage  of  the  channel  phase  diversity 
inherent  in  VSB. 

One  model  for  VSB  modulation  transmits  real  valued 
data  through  a  complex  valued  pulse  shaping  filter  where 
the  quadrature  component  of  the  filter  is  roughly  the  hilbert 
transform  of  the  in-phase  component  [3].  For  example,  a 
FIR  model  of  the  VSB  pulse  shaping  filter  is  shown  in  Fig¬ 
ure  4.  Notice  the  hilbert  transform  impulse  response  in  the 
quadrature  axis  and  a  single  spike  in  the  in-phase  axis. 

The  complex  valued  VSB  pulse  shaping  filter  applied  to  a 
real  valued  data  source  is  a  communication  model  amicable 
to  receiver  C  in  Figure  1.  For  our  simulation  comparison, 
we  use  a  2-VSB  source  with  a  combined  channel-pulse  shape 
filter  whose  frequency  response  is  shown  in  Figure  5,  with 
40dB  SNR  additive  Gaussian  noise.  Receivers  A,  B,  and 
C  are  all  baud-spaced,  with  96  taps,  24  taps,  and  48  taps, 
repectively.  The  length  of  the  simulation  is  50,000  itera¬ 
tions,  with  receivers  A  and  B  updated  using  CMA,  while 
receiver  C  is  updated  using  SA-CMA. 

Figure  6  shows  that  the  receiver  C  demonstrates  the 
lowest  MSE  performance  for  this  set  of  channel  conditions. 
The  resulting  channel-equalizer  response  for  receiver  B,  shown 
in  Figure  7,  does  not  result  in  a  pure  delay  (see  [7]).  This 


VSB  Pulse  Shaping  Impulse  Response 


VSB  Pulse  Shaping  Frequency  Response  Magnitude 


Figure  4:  Impulse  response  and  magnitude  of  frequency 
response  of  VSB  pulse  shaping  filter. 


explains  why  receiver  B  does  not  converge  to  an  accept¬ 
able  MSE  performance.  The  channel-equalizer  responses  of 
receivers  A  and  C  do  converge  to  near  pure  delays,  with 
receiver  C  showing  better  MSE  performance.  However,  for 
channel  responses  that  require  a  longer  equalizer  span,  a 
complex  equalizer  may  be  computationally  prohibitive,  and 
a  real  valued  equalizer  could  be  prefered. 

5.  CONCLUSION 

We  have  described  a  modification  of  the  CM  criterion  for 
real  valued  sources  applicable  to  receivers  that  employ  com¬ 
plex  valued  signal  processing.  Our  modification  extracts  the 
real  part  of  the  equalizer  output  and  is  thus  called  Single- 
Axis  CM  (SA-CM)  criterion.  We  have  shown  that  a  finite 
length,  baud-spaced  equalizer  minimizing  the  SA-CM  crite¬ 
rion  exploits  the  phase  diversity  in  a  complex  valued  chan¬ 
nel  and  admits  only  desirable  global  minima  under  perfect 
equalizability  conditions.  Finally,  we  have  provided  simula¬ 
tions  to  demostrate  its  feasibility  for  staggered  modulation 
formats  such  as  VSB  signaling. 

6.  FUTURE  WORK 

We  have  provided  some  motivation  for  the  use  of  single¬ 
axis  equalization  for  real  valued  and  staggered  modulation 
sources.  Further  application  and  study  is  warranted  for 
more  sophisticated  equalizer  architectures,  such  as  IIR  and 
decision  feedback  (DFE)  equalizers.  Carrier  frequency  off¬ 
set  is  a  practical  concern  in  many  equalizer  designs.  Thus, 
performance  studies  of  single-axis  equalization  and  SA-CMA 
under  non-ideal  carrier  recovery  is  needed. 
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ABSTRACT 

We  present  a  new  digital  modulation  technique  that  intro¬ 
duces  covertness  in  digital  communications.  The  basic  prin¬ 
ciple  is  to  transmit  realizations  of  a  stochastic  process  in 
such  a  manner  that  the  transmitted  waveform  appears  noise¬ 
like.  In  this  paper,  we  have  chosen  to  express  the  transmit¬ 
ted  waveform  in  a  subspace  formalism.  This  allows  for  an 
elegant  geometrical  interpretation  of  the  waveform,  and  it 
naturally  suggests  a  simple  and  accurate  matched  subspace 
detector  for  the  receiver.  The  technique  is  demonstrated  by 
numerical  simulations,  and  a  comparison  with  an  optimal 
Neyman-Pearson  detector  shows  that  our  simple  subspace 
detector  yields  a  high-quality  and  reliable  receiver  for  the 
modulated  signal. 

1.  INTRODUCTION 

An  obvious  way  of  introducing  covertness  in  digital  com¬ 
munications,  is  to  ensure  that  the  transmitted  waveform  ap¬ 
pears  noiselike.  Spread-spectrum  techniques  e.g.  [1],  ap¬ 
ply  a  known  quasi-stochastic  spreading  sequence  to  obtain 
some  degree  of  privacy.  To  decode  the  signal,  the  receiver 
must  have  complete  knowledge  about  the  spreading  sequence, 
and  it  must  be  strictly  synchronous  with  the  transmitter. 
In  addition,  a  simple  spectrogram  analysis  may  detect  the 
pulses  and  disclose  the  existence  of  the  transmission. 

Salberg  and  Hanssen  in  [2]  proposed  the  following  low- 
probability-of-intercept  method  for  encoding  digital  infor¬ 
mation.  Transmit  a  realization  of  a  stochastic  process  X0  (t), 

0  <  t  <  T  to  represent  bit  zero,  and  a  realization  of  another 
stochastic  process  Xft),  0  <  t  <  T  to  represent  bit  one. 
Here  T  is  the  symbol  duration.  Thus,  rather  than  altering 
aspects  of  a  deterministic  carrier  signal,  realizations  of  two 
different  stochastic  processes  are  transmitted.  This  has  the 
effect  that  two  subsequent  equal  source  bits  have  different 
transmitted  waveforms.  In  addition,  two  different  source 
bits  have  similar  waveforms,  due  to  the  fact  that  they  are 
close  in  a  statistical  sense.  The  transmitted  waveform  rep¬ 
resenting  a  bit  string  will  thus  appear  noiselike,  and  it  con¬ 
tains  no  repetitions  or  periodicities.  Moreover,  the  wave¬ 
form  contains  no  discontinuities,  so  the  pulse  length  is  also 


hidden.  Since  the  transmitted  baseband  waveform  is  noise¬ 
like,  a  transmission  would  not  attract  the  attention  of  un¬ 
friendly  receivers.  It  is  obvious  that  this  signaling  method 
adds  an  extra  (physical)  layer  of  security  in  digital  commu¬ 
nication,  thus  reducing  the  risk  of  eavesdropping. 

In  this  paper  we  will  generalize  the  technique  suggested 
in  [2].  We  have  chosen  to  express  the  waveform  generator 
by  means  of  an  orthonormal  basis.  The  background  stochas¬ 
tic  sequences  are  generated  by  a  redundant  linear  transfor¬ 
mation  of  a  stochastic  coefficient  vector,  and  a  transmitted 
“pulse”  is  simply  the  stochastic  sequence  expressed  in  the 
chosen  basis.  The  benefit  of  such  an  encoding  is  that  very 
simple  and  efficient  decoders  can  be  constructed  by  means 
of  subspace  projections  onto  the  two  different  subspaces 
spanned  by  the  basis  waveforms. 

2.  STOCHASTIC  PROCESS  SHIFT  KEYING 

Let  {an}^L_oc  be  the  source  bit  sequence.  The  transmitted 
waveform  for  an  infinite  duration  Stochastic  Process  Shift 
Keying  signal  suggested  in  [2]  can  then  be  written  as 

OO 

X(t)=  [anX0(t)  +  (1  -  a^XAVMt  -  nT  -  \) 

n=—oo 

(i) 

where  u(t)  is  a  unit  amplitude  rectangular  pulse  of  dura¬ 
tion  T.  Here  we  model  the  source  as  a  wide-sense  station¬ 
ary  stochastic  sequence  with  0  and  1  as  possible  outcomes, 
mean  value  E[an]  =  fia,  correlation  sequence  E[anan+k]  = 
Ra(k),  and  A  is  a  uniformly  distributed  random  variable 
A  ~  U\ 0,  T]  independent  of  an. 

A  more  general  strategy  is  to  write  the  transmitted  pulse 
waveform  as  a  linear  combination  of  some  basis  waveforms 
fk(t)  G  X,  k  =  1,2,...,  where  X  is  a  function  space 
that  satisfies  some  desired  properties.  Assume  that  to  trans¬ 
mit  bit  zero  we  use  waveforms  from  a  subspace  Go  C  X, 
and  to  transmit  bit  one  we  use  waveforms  from  a  subspace 
Gi  C  X.  The  subspaces  Go  and  Gi  have  rank  M0  and  Mi, 
respectively,  and  in  general  Go  f)  Gi  #  0  which  means  that 
bit  zero  and  bit  one  may  have  common  basis  waveforms. 
Let  the  number  of  basis  waveforms  in  the  set  Go  U  Gi  be 
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2 M.  Thus  the  transmitted  pulse  is 

Si(t)  =  fT(t)GjSi,  i  =  0, 1  (2) 

where  f(<)  =  [/i(t),  /2(t), /2A/(f)]T  is  a  vector  consist¬ 
ing  of  the  basis  waveforms,  s;  =  [sj,i,  Sj>2, is  a 
random  vector  drawn  from  a  multivariate  probability  den¬ 
sity  P0i(s),  where  Oi  is  a  relevant  parameter  vector,  and 
Gi  =  [g[l) ,  ,...,g^]isa2MxMj  matrix  of  rank  M*. 
The  transmitted  waveform  X  ( t )  can  then  be  written  as 

OO 

X{t)  =  ]T  fT(t  -  nT  -  A) 

n=— oo 

•[anGoSo,n  +  (1  —  fln)GiSiin],  (3) 
where  siin  is  the  stochastic  parameter  vector  at  time  n. 


3.  BASIS  FUNCTIONS 


In  our  attempt  to  choose  a  basis  function  space  T  there  are 
several  aspects  that  we  must  consider.  For  instance,  we 
often  want  the  basis  functions  fk  (t)  to  be  compactly  sup¬ 
ported  such  that  a  transmitted  pulse  is  time  limited.  The 
pulse  length  of  the  basis  waveforms  does  not  have  to  be 
equal  to  the  symbol  duration  T,  which  means  that  a  trans¬ 
mitted  pulse  can  overlap  with  neighboring  pulses.  Further¬ 
more,  the  decoding  can  be  made  simple  if  the  basis  wave¬ 
forms  in  T  are  orthogonal,  i.e. 


L 


fk(t)fj(t)dt  =  a5k 


J* 


(4) 


where  a  is  a  constant,  6k, j  is  Kronecker’s  delta,  and  Ts 
is  the  pulse  length.  As  an  aspect  of  low-probability-of- 
intercept  we  require  that  the  waveforms  are  chosen  such 
that  the  transmitted  baseband  signal  is  noiselike,  and  if  the 
waveforms  do  not  contain  any  discontinuities  that  can  com¬ 
promise  the  pulse  length  we  have  an  additional  security  at 
the  waveform  level. 

Yi  and  Powers  in  [3]  proposed  a  wavelet-based  orthog¬ 
onal  modulation  code  set  where  the  code  set  consists  of  var¬ 
ious  orthogonal  scaling  functions  and  mother  wavelets. 


3.1.  Orthogonal  Modulation  Code 


Yi  and  Powers  [3]  used  the  Hadamard  matrices  to  design 
orthogonal  code  sets.  The  Hadamard  matrices  are  defined 
as 


H? 


1  1 

1  -1  ’ 


H2n  = 


Hn  Hn 
Hn  -Hn 


(5) 


where  n  is  an  integer  and  the  dimensions  of  Hn  are  2n  x  2n. 
Since  the  row  vectors  of  the  Hadamard  matrix  are  orthog¬ 
onal,  the  Hadamard  matrix  yields  an  efficient  tool  to  con¬ 
struct  orthogonal  basis  waveforms. 


Figure  1 :  Example  of  mutually  orthogonal  basis  waveforms 
C'22(t)  and  C43(f)  using  Daubechies  4  wavelets. 


In  the  discrete  wavelet  transform  the  scaled  and  trans¬ 
lated  orthogonal  dyadic  wavelet  ipj,k  (t)  is  defined  as 

iPj,k(t)  =  2-j^(2~jt-k)  (6) 

where  i/>(t)  is  the  mother  wavelet,  j  is  a  scale  index,  and  k 
is  a  translation  index.  For  integers  j,  k,  m  and  n  we  have 
that  the  inner  product  obeys 

=  (2) 

The  scaling  functions  </>j,k(t)  are  orthogonal  only  across 
translation  but  not  across  scale, 

^j,nW)  =  &k,n-  (8) 

At  specific  scales  and  several  translations  the  wavelets  are 
orthogonal  to  scaled  and  translated  scalings  functions.  For 
any  j  <  m,  we  have 

=0.  (9) 

From  the  properties  discussed  above,  Yi  and  Powers  [3] 
proposed  the  following  wavelet-based  orthogonal  modula¬ 
tion  code 

Ci,i(t)  =  A(j>j,o{t) 

CU2(t)  =  Aipj,0(t)  (10) 

m—1 

Cm,n(t)  =  AY,Hm{n,k  +  \)xP{2^-v)t_kl) 

k= 0 

where  A  is  a  constant,  m  is  a  constant  which  must  a  be 
power  of  2,  p  =  log2  m,  n  is  1  <  n  <  m,  and  l  is  the  time 
domain  support  length  of  the  wavelet  (which  equals 

the  pulse  length  Ts).  Thus,  2 M  orthogonal  wavelet  basis 
waveforms  Ci>2(f), ...,  span  QoUQi,  and 

we  select  at  least  M*  of  these  orthogonal  basis  waveforms 
to  span  Qi.  The  selection  of  basis  waveforms  is  performed 
by  the  matrices  G0  and  Gi. 

Fig.  1  shows  examples  of  orthogonal  basis  waveforms 
based  on  Daubechies  4  wavelets  and  scaling  function  [4]. 
From  the  example  we  see  that  the  waveforms  become  “sharper” 
as  M  increases,  and  that  they  thus  contain  higher  frequency 
components.  For  Daubechies  4  wavelets,  we  have  that  Ts  = 

IT  —  7 T.  Thus,  the  information  carrying  pulses  will  over¬ 
lap  substantially  in  time. 
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SUBSPACE  SIGNALS 


Figure  2:  Trajectories  of  the  subspace  signals  x0  (dotted) 
and  xi  (dash-dotted). 


4.  DECODING 

The  vector  representation  of  the  transmitted  pulse  is 

Mi 

Xi  =  JT  Si<kg^  =  GiSi,  *  =  0,1.  (11) 

fc=i 

In  this  case,  the  signal  x,  is  known  to  lie  in  the  M,  dimen¬ 
sional  linear  subspace  (Gj)  spanned  by  the  columns  of  G,. 
This  is  illustrated  in  Fig.  2  where  the  dotted  line  is  the  tra¬ 
jectory  of  the  subspace  signal  x0,  the  dash-dotted  line  the  is 
the  trajectory  of  the  subspace  signal  xi,  M0  =  Mi  =2  and 
2 M  =  3.  From  the  figure  wee  see  the  randomness  of  the 
signals  x0  and  xi ,  and  that  x,  is  in  the  subspace  spanned  by 
the  columns  of  G,  =  [gg!,g^].  The  matrix  Gj  can  be  cho¬ 
sen  to  introduce  redundancy  in  the  transmitted  symbol  Xj, 
and  the  elements  in  Xj  are  then  linear  combinations  of  the 
elements  in  s,.  We  see  that  the  gj/*  direction  is  weighted  by 
Sj,/t,  and  we  now  have  a  correspondence  between  the  phys¬ 
ical  time-domain  and  a  2M-dimensional  signal  space. 

We  define  the  projection  operator  as  [5] 

PGi  =  Gi(GfGi)-1Gf,  (  =  0,1  (12) 

so  that  PGi  r  is  a  projection  of  the  vector  r  onto  the  subspace 
(Gj).  If  the  subspaces  (Go)  and  (Gi)  are  disjoint,  the 
columns  of  Go  and  Gi  are  linearly  independent.  A  stronger 
condition  is  orthogonality,  which  means  that  GqGi  =  0. 
For  orthogonal  subspaces  we  have 

Pg0Go  =  Go  and  PGoGi  =  0  (13) 

PgiGi=Gi  and  PGlGo  =  0  (14) 

and  we  see  that  an  orthogonal  projection  has  a  null  space 
that  is  orthogonal  to  its  range. 


4.1.  Detection 

Given  a  transmitted  time-domain  waveform  s,(i),  we  as¬ 
sume  that  this  signal  is  contaminated  by  an  additive  distur¬ 
bance  n(t),  so  that  the  waveform  at  the  receiver  input  is 
r(t)  =  Si(t)  +  n(t),  where  n(t)  is  a  zero-mean,  Gaussian 
white  noise  process  with  power  spectral  density  Sn(ui)  — 
jVo/2,  Vcu.  The  vector  representation  of  the  signal  plus  noise 
is  r  =  x,  +  n,  where  the  elements  of  the  received  vector  r 
are 

rj  =  f  r(t)fj(t)  dt  =  Xij  +  rij,  j  =  1, 2 M.  (15) 

JT. 

Since  the  noise  is  a  zero-mean  Gaussian  process,  rij  is  also 
Gaussian  with  E{nj}  =  0  and  Var{nj}  =  A/o/2. 

Assume  that  the  stochastic  coefficient  vector  s,  is  mul¬ 
tivariate  Gaussian  iV[mStj,  R*,,]  and  that  the  noise  vector 
n  is  distributed  as  N[0,  (J\/o/2)I].  Define  E[xi]  =  m,  = 
G,mSii  andE,[(x,-m,)('Xj-m,)T]  =  Rx>i  =  G<Rs>,Gf, 
then  if  s,(f)  is  sent  we  have  that  the  received  vector  r  is  dis¬ 
tributed  as  +  (A/"o/2) I].  Note  that  the  matrix 

Ri,;  may  be  singular,  depending  on  G  j.  In  that  case  a  bias 
term  AI,  where  A  <<  1,  must  be  added  to  regularize  R Xii. 

In  general,  a  Neyman-Pearson  hypothesis  test  (e.g.,  [5]) 
may  be  applied  in  the  decoding,  since  we  assume  that  all 
relevant  probability  densities  are  known  to  the  intended  re¬ 
ceiver. 

Another,  but  suboptimum,  detector  that  may  be  used  is 
the  so  called  matched  subspace  detector  (MSD)  [5].  Scharf 
proposed  the  MSD  to  detect  an  unknown  deterministic  sub¬ 
space  signal  in  a  known  subspace.  We  have  extended  the 
MSD  to  classify  stochastic  subspace  signals  in  known  sub¬ 
spaces.  The  basic  idea  is  to  regard  a  stochastic  subspace 
signal  as  an  unknown  deterministic  subspace  signal.  In  case 
of  two  different  classes,  the  extended  MSD  will  project  the 
received  vector  r  onto  the  subspaces  (G0)  and  (Gi).  The 
statistic  rrPG.r  is  clearly  a  maximal  invariant  statistic  [5], 
and  the  decision  criterion  is  that  we  choose  class  Do  if 

rrPGor  >  rTPG,r,  (16) 

and  otherwise  choose  class  0 1 .  The  detector  measures  the 
amount  of  the  received  energy  that  resides  in  subspace  (Gj), 
and  then  chooses  the  class  corresponding  to  the  subspace 
containing  the  largest  amount  of  energy.  The  benefit  of 
such  a  detector  compared  with  the  Neyman-Pearson  detec¬ 
tor  is  that  the  decision  criterion  is  independent  of  the  addi¬ 
tive  noise  variance  (A/o  / 2) .  Obviously,  the  particular  choice 
of  subspace  matrices  Go  and  Gj  will  influence  on  the  per¬ 
formance  of  the  system. 

An  analytic  expression  for  tfie  bit-error  probability  (BEP) 
of  the  Neyman-Pearson  detector  for  Gaussian  signal  and 
noise  has  been  given  in  [6].  The  expression  is  on  integral 
form,  and  may  thus  be  evaluated  numerically. 
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Figure  3:  Message  ’1111100101’  encoded  by  means  of  or- 
thonormal  basis  waveforms  constructed  in  Eq.  (10)  with 
M  =  4. 


5.  SIMULATIONS 

To  demonstrate  the  proposed  digital  modulation  technique, 
we  now  present  some  numerical  simulations. 

In  our  numerical  simulations  we  use  orthonormal  wavelet 
basis  functions  given  by  Eq.  (10).  The  basis  functions  are 
constructed  by  Daubechies  4  wavelets,  which  yields  a  time 
domain  support  length  l  =  7,  and  the  scale  index  was  j  = 
0.  The  subspaces  matrices  Go  and  Gi  are  orthogonal  to 
each  other  and  have  orthonormal  columns.  Furthermore, 
pe0{s)  =  p0l( s)  is  multivariate  Gaussian  jV[0,R„],  which 
yields  E{sjs0}  =  E{ sfsj}.  The  random  vector  s0  and 
si  are  generated  as  realizations  of  an  AR(2)-process  with 
parameters  00  =  6\  =  [ai,02,cr2]  =  [0.1,0.35, 1]T.  The 
matrices  Go  and  Gi  are  constructed  from  the  orthonormal 
eigenvectors  of  the  2M  x  2M  covariance  matrix  of  an  AR(2) 
process  with  di  =  0.81,02  =  0.35  and  a2  =  1.  This  is  a 
simple  way  of  constructing  the  subspace  matrices,  but  obvi¬ 
ously  not  the  only  possibility. 

Fig.  3  shows  an  example  of  the  transmitted  waveform 
for  the  message  ’1111100101’.  The  encoder  applies  or¬ 
thonormal  basis  waveforms  constructed  from  Eq.  (10)  with 
M  =  4,  which  yields  8  different  basis  waveforms.  Observe 
that  two  subsequent  equal  source  bits  have  different  wave¬ 
forms,  since  the  basis  waveforms  are  weighted  by  a  stochas¬ 
tic  vector  Xj.  Note  also  that  the  pulse  length  is  hidden,  and 
that  there  are  no  periodicities  in  the  information  carrying 
signal. 

The  average  pulse  energy  is 

Eb  =  E{(si(t),Si{t))}  =  EyT  M<)|2cftj.  (17) 


Eb  =  E{xjxi}  =  E{sjGjGiSi}  =  E{  sfsj.  (20) 

Thus,  as  in  conventional  communications  we  may  define  the 
signal-to-noise  ratio  (SNR)  as  SNR  =  Eb/Mo,  which  in 
our  case  can  be  written  as  SNR  =  tr{Rs}/A/o,  where  tr{-} 
denotes  the  trace. 

Fig.  4  shows  the  exact  BEP  of  the  Neyman-Pearson  de¬ 
tector  with  all  parameters  known  (full  lines),  and  Monte 
Carlo  simulated  BEP  (20000  repetitions)  of  the  extended 
MSD  (crosses).  In  curve  (i)  8  orthonormal  basis  waveforms 
are  used,  and  Gj  has  dimension  8x4.  Curve  (ii)  shows 
the  BEP  with  40  orthonormal  basis  waveforms,  and  G  j  of 
dimension  40  x  20.  From  Fig.  4  we  see,  as  expected,  that 
the  BEP  decreases  as  a  function  of  increasing  SNR.  Fur¬ 
thermore,  notice  that  the  suboptimal  extended  MSD  has  a 
performance  close  to  that  of  the  optimal  Neyman-Pearson 
detector.  This  is  a  remarkable  result  since  the  Neyman- 
Pearson  detector  is  assumed  to  have  knowledge  about  the 
measurement  noise  Mo/2  and  the  covariance  matrix  Rs. 
The  reason  for  this  close-to-optimal  performance  of  the  ex¬ 
tended  MSD  is  the  orthogonality  of  the  subspaces  (G0) 
and  (Gi).  Since  the  subspace  matrices  encode  the  random 
vector  s,  the  extended  MSD  has  all  the  relevant  informa¬ 
tion  available.  The  lack  of  knowledge  of  the  noise  variance 
Mo  / 2  does  not  influence  on  the  performance  of  the  extended 
MSD,  since  the  decision  is  based  on  the  energy  of  the  re¬ 
ceived  vector  r  in  each  subspace.  If  we  however  choose 
0O  #  0i  then  the  Neyman-Pearson  detector  will  outper¬ 
form  the  extended  MSD  for  any  choice  of  subspaces  (G0) 
and  (Gi). 

For  low-probability-of-intercept  communications,  the  di¬ 
mension  of  Gj  must  be  chosen  such  that  the  signal  power 
per  unit  bandwidth  is  below  the  noise  spectral  density.  In 
the  simulations  above,  M  =  20  implies  the  use  of  wave¬ 
forms  with  higher  frequency  components  than  for  M  —  4. 
Thus,  larger  values  of  M  spreads  the  transmitted  signal  over 
a  wider  frequency  band.  Since  the  energy  of  the  transmitted 
baseband  waveforms  in  case  of  M  =  4  and  M  =  20  are 
equal,  the  signal  power  per  unit  bandwidth  is  lower  for  the 
case  of  M  =  20. 
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Figure  4:  Bit-error  probability  as  a  function  of  SNR,  (i) 
M  =  4  and  (ii)  M  —  20.  Crosses  are  Monte  Carlo  sim¬ 
ulations  of  the  extended  MSD,  and  full  curves  are  exact  the¬ 
oretical  results  of  the  Neyman-Pearson  detector. 


Fig.  5  shows  a  spectrogram  (in  dB)  of  the  transmitted 
baseband  signal  in  Fig.  3.  The  horizontal  axis  is  the  normal¬ 
ized  time  axis,  and  the  vertical  axis  is  the  frequency  axis, 
normalized  with  respect  to  the  maximum  frequency  fmax 
of  X(t).  From  the  spectrogram  we  clearly  see  that  there  is 
no  structures  that  can  disclose  the  transmitted  bit  sequence, 
nor  disclose  that  an  information  carrying  signal  is  actually 
being  transmitted. 

6.  CONCLUSIONS 

We  have  presented  a  new  digital  modulation  technique  that 
offers  some  degree  of  security  at  the  waveform  level.  The 
transmitted  waveform  is  noiselike,  and  would  therefore  not 
attract  the  attention  of  unfriendly  receivers.  It  is  demon¬ 
strated  that  simple  and  efficient  detectors  can  be  constructed 
by  means  of  a  subspace  formalism,  and  that  in  some  cases 
the  performance  of  these  suboptimal  detectors  equals  that 
of  the  optimum  Neyman-Pearson  detector.  The  extension 
of  the  proposed  technique  to  multi-user  communication  is 
straightforward. 
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ABSTRACT 

In  this  communication,  Time  Varying  Higher  Order  Spectra  and 
specifically  multitime-frequency  representations  have  been  used 
for  detection  and  classification  purpose.  A  new  detector  is 
presented  for  frequency  modulations  disrupted  by  multiplicative 
and  additive  noise.  Statistical  study  is  performed  and 
corresponding  simulations  are  presented.  An  extension  to 
multiple  hypothesis  testing  is  also  presented  to  classify 
neighboring  frequency  modulations  in  a  context  of  multiplicative 
and  additive  noise.  Some  simulations  illustrate  the  performances 
of  our  approach  comparing  to  the  similar  second  order  approach. 

1.  INTRODUCTION 

Time-Frequency  and  Higher  Order  Spectra  have  been  intensively 
studied  during  these  last  few  years.  The  first  involves  time 
varying  signals  and  depicts  the  evolution  of  the  power  spectral 
density  through  the  time.  However,  second  order  statistics  do  not 
take  into  account  the  non-linear  phenomena  and  perfectly 
describes  only  the  linear  systems  and  the  gaussian  processes.  For 
non-gaussian  signals  and  non-linear  systems  analysis,  many 
techniques  based  on  HOS  were  reported.  Usually,  this  method 
required  stationary  assumption.  Recently,  Time  Varying  Higher 
Order  Spectra  (TVHOS)  are  defined  [1],[2]  and  permit  to- 
analyze  non-linear  time  varying  signals.  In  this  paper,  we  present 
two  new  detection  /  classification  algorithms  based  on  TVHOS 
and  applied  to  frequency  modulations  disrupted  by  a  real 
multiplicative  noise  in  an  additive  complex  noise. 

2.  TIME  VARYING  HIGHER  ORDER 
SPECTRA 

Many  definitions  of  TVHOS  can  be  found  in  the  literature,  they 
differ  in  particular  in  the  lag  separation  between  the  time  or 
frequency  terms  used  for  product.  They  can  also  differ  in  the 
number  of  conjugated  terms  and  with  the  used  space  of 
representation  as  well  :  time-multifrequency  space  or  multitime- 
frequency  space.  The  user’s  aim  will  lead  him  to  decide  upon  the 
type  of  representation  whether  he  chooses  to  set  out  the  non 
linear  phenomena  or  to  preserve  the  time-frequency  accuracy;  for 
example  in  modulation  cases.  To  reduce  the  computational  cost, 
it  is  customary  to  consider  only  a  slice  of  TVHOS.  Sliced 


TVHOS  (STVHOS)  were  first  introduced  by  Fonollosa  and 
Nikias  in  [3]  and  were  defined  as  particular  slices  of  the  Wigner- 
Multispectrum.  In  practice,  the  principal  slice  of  the  Wigner- 
Trispectrum  is  the  one  most  used  for  signal  analysis  because  it  is 
a  real  representation  that  contains  all  the  autoterms  of  the  signal. 

A  computationally  efficient  implementation  is  given  in  the 
frequency  domain  by : 

SWD4m(n,r)  =  \x\y+'^).X*'1  (y-j)e‘2™Vy,  (l) 

Simultaneously,  Stankovic  [4]  proposed  a  multitime-frequency 
definition  of  Wigner  Higher  Order  Distributions  as  for  the  fourth 
order : 

MTWD4x(n)  (n,y)  =  £x*  (wl  +  n2  +  rii  +  k)jc(n3  -  k) 

k 

jc(n2  -  k)jc  *  (-rtl  +  k)e~Jir/k 

For  computational  purpose,  evaluation  of  the  MTWD4  can  be 
done  by  considering  only  the  principal  temporal  slice  given  by 
n  =  —  nl  =  n2  =  n3 .  Hence,  we  obtain  the  L-Wigner  Ville 
distribution  : 

L w 4*(„) (n,y)  =  Y,x2(n  +  k).x*2  {n-k)e'I%,r,k 

k 

which  is  a  dual  formulation  of  (1).  Due  to  its  good  localization 
properties,  even  for  non-linear  frequency  modulation,  LW4  has 
been  extensively  studied  by  Boashash  in  [5]  for  deterministic 
time  varying  signal  processing.  When  dealing  with  random  non 
stationary  signals,  Boashash  defined  the  Moment  and  the 
Cumulant  Wigner-Trispectrum  ( MWD4  and  CWD4 )  as  : 

MWD\n)(n,y )  = 

JL  o  ,  (3a  &  b) 

CWD\n)(n,y)  = 

k 

with  mA  (n,k)  =  £:{  x2  («  +  k)jc  *2  (n-k)  }(  E  denotes  the 
statistical  expectation)  and  c4(n,k)  =  Cun\x2(n+k)x*2  (n-k)  } 
( Cum  represents  the  cumulant  operator).  [5]  shows  that  this  2 
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formulations  are  helpful  in  multiplicative  noise  signals  for 
instantaneous  frequency  law  estimations. 

Signal  analysis  with  TVHOS4 
Let  us  consider  the  following  model  of  signal : 

x(n)  =  bm{n).ejm")+&)  (4) 

where  bm(n)  is  a  zero  mean  white  gaussian  band  limited  process 
with  variance  a2bm .  0  is  a  random  phase  uniformly  distributed 
in  [0,  2k].  <&(«)  is  a  polynomial  function  of  the  time  index. 

This  model  of  signal  have  received  an  increasing  interest  these 
last  few  years,  in  particular  in  applications  such  as  Radar  and 
Sonar  where,  in  addition  to  the  Doppler  effect,  the  returned 
signal  is  subjected  to  amplitude  modulation  caused  by  the 
changing  orientation  of  non-point  target  [6],  Moreover, 
polynomial  phase  signal  disrupted  by  a  multiplicative  noise 
provide  an  efficient  model  for  speech  analysis  due  to  time 
varying  amplitude  produced  by  speech  resonance  [7],  Finally, 
acoustical  analysis  of  transient  signals  produced  by  mechanical 
systems  as  been  shown  to  follow  such  a  model  [8], 

If  we  calculate  the  Wigner  distribution  of  the  signal  (4),  we 
obtain  WSxW{n,y)  =  ab„*  ■  So,  Wigner  Spectrum  is  unable  to 

characterize  such  signal.  To  perform  the  analysis  of  this  model  of 
signal  and  generally  for  non  linear,  non  stationary  signals,  higher 
order  approaches  become  necessary  and  the  results  obtained 
using  the  signal  (4)  are  for  a 

real  white  gaussian  noise  process  bm. 

Multitime-frequency  representations,  using  the  spectral 
dependencies  between  negative  and  positive  frequencies  allow, 
for  real  random  signals,  the  construction  of  a  non  oscillatory 
interference  localized  on  the  Instantaneous  Frequency  Law  (IFL) 
®'(w)  as  shown  on  figure  I . 


Figure  1  :  interferences  geometry  in  real  multiplicative  noise 

Due  to  this  property,  we  can  conclude  that  MWD4  gives  a  better 
estimate  frequency  law  <*>’00  of  the  signal  (4).  Similar 
conclusion  can  be  made  for  CWD4. 

We  illustrate  this  interferences  property  of  TVHOS  for  the 
following  signal : 

xi(n)  =  bjn)s(n)  (5) 


with  s^n)  =  en^^hn)*) For  thjs  signa[  and  for  a 

band  limited  real  gaussian  multiplicative  noise  with  PSD 
BM(y ),  theoretical  MWD4  is  given  by  : 

whereas  theoretical  IFSis  m^y^WD^y^BMiy) 

Figures  2,3,4  show  the  highly  resolution  of  the  fourth  order  time- 
frequency  representation  due  to  the  previously  mentioned 
property. 


Figure  2  :  MWD4  for  signal  x2 


Figure  3  :  CWD4  for  signal  x2 


656 


Considering  these  results,  it  seems  that  the  good  localization 
properties  of  Multitime-Frequency  representations  can  be 
exploited  for  classification  of  unknown  instantaneous  frequency 
laws  disrupted  by  a  multiplicative  noise.  So,  an  improvement  of 
the  local  SNR  can  be  hope  in  a  decision  context. 

3.  DETECTION  WITH  TVHOS4 

In  first,  we  consider  the  basic  problem  of  detecting  the  presence 

or  absence  of  a  signal  {x(n);  n=0,l N-l)  following  the 

relation  (4)  in  a  set  of  measurement  {r(n)  ;  n=0,l . N-l} 

corrupted  by  independent,  additive,  white  gaussian  noise  with 
zero  mean  : 

j//0  :r(n)  =  ba(n) 

[Hi  :r(n)  =  x(n)  +  ba(n ) 

The  detection  method  must  enable  to  decide  between  H0  and  Ht 
by  analyzing  the  received  signal  r(n).  By  analogy  with  the 
classical  correlator  detector,  we  can  construct  a  time-frequency 
correlator  and  also  a  multitime-frequency  correlator  based  on 
TVHOS.  For  the  fourth  order,  the  detection  statistic  is  obtained 
by  the  inner-product  of  the  TVHOS4r  of  the  received  signal  r(n) 
and  a  reference  TVHOS4ref  obtained  by  averaging  on 
independent  realizations  of  x(n)  or  by  taking  the  LW4  of  the 
polynomial  phase  signal  in  equation  (4)  as  depicted  on  the  figure 
5. 

LW4,eJ<n,y) 


TFHOS4  r(n,y) 


Figure  5  :  TVHOS4  Based  detector 


Time  equivalence  and  statistical  behavior 


Theoretical  deflexion  is  derived  in  the  case  of  MWD4  detector. 
Using  Moyal  relation,  MWDrdetector  can  be  written  as  : 

A  =  J  X  MWDire/  (»,  y).MWDAr  (»,  y)dy 

r  n 

(«)•'*»  j 

For  both  bm(n)  and  bjn)  white,  zero  mean  and  gaussian,  we  can 
express  the  deflection  at  the  detector  output  by  : 


Zre/2 


2\e\x\H\) 

-e\x 

H’ 

f{a//o}+  v{/ 

where  E(X/ H and  V(X/Hj)  are  respectively  mathematical 
expectation  and  variance  of  the  detector’s  output  under  //, 
hypothesis.  So,  we  obtain  the  following  results  for  the  MWD4 
detector  ( D4 )  and  for  the  WVD  (D2)  (which  is  only  an  energy 
detector) : 


2A'(crJ+4gJcrJ+4gJ) 

A"  V+2o->J+2crJ 

2(gJ(  N*  +4N2  +4N  )+16Afrh,V  +W(  N+2  pjcrj) 

4  laj(  8/V2 +44^+90  )+aJaJ,(  8.V+48  )+cJeJ(  8/V2+80.V+192 )' 
4^+34^+124  )+crto8(  8.V+64 ) 

(6) 

If  we  draw  the  2  expressions  of  the  equation  (6)  versus 
multiplicative  and  additive  noise  standard  deviation, 
(figure  6  and  7)  we  can  conclude  that  WVD  performs  a  better 
detection  than  MWD4  for  the  signal  (4).  We  illustrated  this 
theoretical  results  through  the  detection  of  a  signal  following  the 

relation  (4)  with  0’(«)  =  8.92  10-3«  +  0.314 .  b„(n)  is  a  real 
gaussian  noise  process  with  a  bandwidth  equal  to  0.1  in 
normalized  frequency  and  ba(n)  is  a  white  complex  circular 

gaussian  noise.  The  SNR  is  defined  by  O bm  /  O ba  and  was  taken 

to  -10  dB.  The  results  are  averaged  on  20  independent 
realizations.  The  ROC  curves  clearly  indicate  (  Figure  8  )  the 
better  results  obtained  by  the  WVD  detector  and  they  also 
confirm  the  statistical  results  mentioned  above. 


Figure  6  :  Theoritical  deflexion  WVD 


Figure  7  :  Theoritical  deflexion  for  MWD4 
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Figure  8  :  ROC  curves  for  MWD4  and  WVD  detector 

4.  CLASSIFICATION  WITH  TVHOS4 

If  we  extend  the  decision  rule  to  generalized  detection  with 
multiple  hypotheses  tests,  we  obtain  in  the  TVHOS4  space  : 

tfi :  TVHOS4r(n,y)  =  TVHOS4  X[{t)+K  («,y) 

i  \ 

HL:TVHOS4r(n,y)  =  TVHOS4XL(l)+bo(n,Y) 

where  TVHOS4  can  be  MWD4  or  CWD4  representations.  So,  the 
classification  scheme  is  a  bank  of  L  “TVHOS4  - energy ” 
compensated  detectors  (to  ensure  normalization)  as  presented  on 
the  figure  9. 


The  output  of  each  detector  can  be  view  as  a  special  case  of 
“minimum  distance”: 


d,2  =  J  (TVHOS4r  («,  y)  -  TVHOS4  ref  (n,  y)f  dndy 

”,Y 

after  some  calculus,  we  obtain  : 

d,2  =  ~2  J TVHOSf  (n, y).TVHOS4reJi  (n,  y)dtdy  +  EAr2  +  EAre/i 2 

">Y 

and  the  decision  rule  is  : 


t]  =  arg  max[£>(K«)  /  ref  («))  - 1  £4  (ref  («))] 

ref  2 

where  D{r(ri),  ref  («))  =  {TVHOS4, ,  TVHOS4  ref> ) . 

5.  CLASSIFICATION  OF  NEIGHBORING 
FREQUENCY  MODULATIONS. 

Performances  were  illustrated  for  classification  of  two 
neighboring  instantaneous  frequency  modulations  laws  (IFL)  in  a 
context  of  multiplicative  and  additive  noise.  The  aim  of  this 
simulation  is  to  separate  two  signals  following  the  relation  (5) 
with  a=  10/ it,  b  =  0.125  or  0.130.  For  different  SNR  values  and 
for  three  multiplicative  noise  bandwidths  (indicated  on  the 
figures  in  normalized  frequency),  1000  Monte-Carlo  runs  were 
performed.  All  the  representations  were  estimated  by  averaging 
on  ten  independent  realizations.  In  each  case,  the  percentage  of 
non-correct  signal  classification  (error  probability)  is  plotted  on 
the  next  figures. 

Simulations  results: 

First,  we  can  show  (figures  10,11  and  12)  that  when  the 
bandwidth  of  bm(n)  increase,  the  two  fourth  order  classifier  gives 
much  better  results  down  to  -3  and  -6  dB.  However  we  can  note 
that  the  performances  of  two  fourth  order  classifier  are  better 
than  WVD-classifier  at  high  SNR.  This  remark  is  valid  for  any 
multiplicative  noise  frequency  bandwidth,  but  the  results  indicate 
that  this  tendency  is  emphasized  for  large  multiplicative  noise 
bandwidth.  Figure  13  represents  the  behavior  of  the  fVS  based 
classifier  and  the  CWD4  based  classifier  versus  multiplicative 
noise  bandwidth  and  SNR.  We  can  see  that  the  TVHOS4- 
Classifiers  are  less  susceptible  than  WVS  Classifier  to  the 
multiplicative  noise  bandwidth.  CWD4  Classifier  yields  very 
good  performances  and  a  quasi-total  immunity  to  multiplicative 
noise  bandwidth.  Other  simulation  on  linear  frequency 
modulations  can  be  found  in  [8],  These  results  lead  to  the  same 
conclusion  that  those  presented  in  this  part. 
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Figure  10  :  Multiplicative  noise  bandwidth  0.02Hz 
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Figure  11  :  Multiplicative  noise  bandwidth  0.07Hz 
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Figure  12  :  Multiplicative  noise  bandwidth  0.12Hz 
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Figure  13  :  WS  (broad  lines) ,  CWVD4  (fine  lines) 

6.  CONCLUSION 

In  this  paper,  two  new  classifiers  based  on  TVHOS  were 
presented.  The  performances  were  evaluated  through 
simulations,  and  we  clearly  show  that,  this  approach  really 
improves  the  performances  to  classify  neighboring  1FL 
modulation  laws.  Moreover,  TVHOS4  classifiers  superiority  is 


really  significant  when  the  multiplicative  noise  bandwidth 
increases  and  the  signal  to  noise  ratio  is  sufficiently  high. 
Performance  comparison  with  the  optimal  (ML)  detector  is 
currently  under  consideration  but  is  non  obvious  because  of  the 
non  gaussianity  of  the  signal  (4).  Simulation  methods  (MCMC) 
can  provide  an  alternative  solution  in  order  to  approach  the 
theoretical  optimal  solution  and  will  be  presented  in  future 
works. 
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ABSTRACT 

A  non-decision-aided  (NDA)  PLL  method,  which  recovers  the 
carrier  phase  of  QPSK/TDMA  bursts  without  preamble  by 
iterative  processing,  is  presented.  The  characteristics  of  the  phase 
detector  in  the  loop  are  examined  and  the  results  show  that  the 
NDA  PLL  exhibits  similar  performance  to  a  4th  power  PLL.  The 
phase  error  performance  was  simulated  and  the  results  indicate 
that  the  proposed  NDA  PLL  is  applicable  to  recovering  carrier 
phase  of  QPSK/TDMA  bursts  of  an  order  of  100  symbols  or 
more  in  length. 


1.  INTRODUCTION 

In  satellite  communications,  the  QPSK  modulation  technique  has 
been  widely  used  in  conjunction  with  time  division  multiple 
access  (TDMA)  mode  of  operation.  Carrier  phase  recovery  of 
bursts  is  a  major  design  issue  in  TDMA  systems.  While  a  phase- 
locked  loop  (PLL)  circuit  can  generally  provide  good  carrier 
phase  tracking  performance,  it  typically  takes  a  long  time  to 
acquire  the  carrier  phase.  Therefore,  a  long  preamble  is  necessary 
for  carrier  phase  recovery.  This  paper  presents  a  non-decision- 
aided  (NDA)  PLL  method,  which  recovers  the  carrier  phase  of  a 
QPSK/TDMA  burst  without  a  preamble  by  iterative  processing, 
thereby  reducing  the  burst  overhead. 

Section  2  describes  the  NDA  PLL.  A  set  of  equations  to  update 
the  loop  are  presented.  Phase  detector  characteristics  of  the  NDA 
PLL  are  examined  in  section  3.  Section  4  presents  and  discusses 
performance  simulation  results  for  the  NDA  PLL. 

2.  NDA  PLL  DESIGN 

The  block  diagram  of  the  receiver  is  shown  in  Figure  1.  Here,  the 
local  oscillator  does  not  track  the  phase  of  the  received  carrier. 
Instead,  the  carrier  phase  is  recovered  by  subsequent  processing 
which  uses  in-phase  and  quadrature  sampled  values  with 
reference  to  the  local  oscillator  phase.  Die  carrier  phase  recovery 
process  includes  calculating  the  phase  of  each  sampled  value  (Xn, 
Yn),  digital  PLL  processing  for  estimating  the  carrier  phase,  and 
rotating  the  phase  of  sampled  values. 


Figure  1.  Receiver  Block  Diagram 

A  block  diagram  of  the  proposed  NDA  PLL  is  depicted  in  Figure 
2.  It  is  derived  by  adding  a  saw-tooth  nonlinear  function  to  the 
second-order  PLL  presented  in  [1],  The  overall  design  concept  is 
similar  to  that  of  the  timing  recovery  circuits  in  [2],  in  a  sense 
that  a  saw-tooth  nonlinear  function  is  used  to  eliminate  the 
QPSK  modulation  and  a  feedback  structure  is  used  to  unwrap 
and  filter  the  phase  estimate.  It  is  noted  that  a  filtering  method  in 
[2]  resembles  a  first-order  PLL,  which  can  be  obtained  by  putting 
K2=0  in  Figure  2.  Parameters  of  the  NDA  PLL  are  given  by 

K|  =  2£ (^~n  )  and  K2  =—  (2^"-)2 ;  and  f„  and  £  respectively 
Rs  K,  Rs 

denote  the  loop  natural  frequency  and  damping  factor  of  the 
PLL;  and  Rj  denotes  symbol  rate.  The  loop  is  updated  using  the 
following  equations  (l)-(4): 


7C  20 

An=An-T-fot(-^+0.5) 

(1) 

i  n 

Sn=0„  +  Kl  A.n 

(2) 

0»+l  =0„  +  K,-K2-0e„ 

(3) 

A  A 

0n+.=0„+Sn 

(4) 

where  Int(-)  rounds  a  number  down  to  the  nearest  integer.  From 
Figure  2,  one  can  note  that  the  PLL  provides  the  carrier  phase 

estimate  as  well  as  the  frequency  offset  estimate  <pn . 
Therefore,  using  those  estimates,  the  iterative  PLL  processing  is 

A  •  • 

done  as  follows.  By  having  <pa  and  substituting  (pn  with  -  <pn  at 

the  end  of  a  burst,  the  PLL  process  continues  in  an  opposite 
direction.  Once  the  PLL  acquires  the  received  signal,  then  the 
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loop  parameter  can  be  changed  to  get  a  better  tracking 
performance.  For  example,  the  PLL  loop  bandwidth  is  reduced  to 
achieve  smaller  phase  error  variance.  In  designing  the  PLL  in 
Figure  2,  the  impact  of  the  nonlinear  phase  detection,  which  is 
discussed  in  the  next  section,  should  be  taken  into  consideration 
in  addition  to  typical  PLL  design  principles. 


Figure  2.  Block  Diagram  of  the  NDA  PLL 


3.  PHASE  DETECTOR 
CHARACTERISTICS  OF  THE  NDA  PLL 

The  phase  detection  process  of  the  NDA  PLL  includes  tan  '(•) 
and  saw-tooth  nonlinear  fiinctions.  Let  us  define  the  phase 

detector  non-linearity  as  g[<p(t)+0„(t)],  where  <p(t)  =0„(f)  -0„(O 
and  6n(t)  denotes  input  phase  noise.  Assuming  that  the  phase  of 
the  input  carrier  varies  much  more  slowly  than  the  input  phase 
noise  and  the  bandwidth  of  the  PLL  is  much  smaller  than  the 
symbol  rate  Rs,  then  the  phase  error  <p(t)  varies  much  more 
slowly  than  the  input  phase  noise  0„(t).  Therefore,  the  low 
frequency  component  of  the  detector  output  will  represent  the 
phase  detector  characteristics;  thus  the  characteristics  can  be 
evaluated  from  the  expectation 


g'(<p)  =  E  {  g[<P+®n]  I  <P  }  (5) 

g'(<p)  of  the  NDA  PLL  is  obtained  by  simulation  and  shown  in 
Figure  3.  As  shown  in  the  analysis  on  phase  detectors  in  [4],  the 
effective  phase  detector  output  is  suppressed  as  (E(/N0)  becomes 
low.  The  shape  of  the  phase  detector  characteristics  is  similar  to  a 
4th  power  or  four-phase  remodulator  PLL,  which  is  illustrated  in 
Chapter  11  of  [5]. 


Let’s  consider  the  signal  to  noise  ratio  (SNR)  at  the  phase 
detector  output.  For  a  linear  phase  detector,  the  SNR  at  the  phase 
detector  output  can  be  given  by 


SNRpd.l  - 


1 

_  2 

°PD,L 


=  4- 


rE  .  N 


vN°y 


(6) 


Ro.n 


SNR 


PD.NPA 


SNR, 


(7) 


Ro.nda  is  obtained  by  simulation  and  shown  in  Figure  4,  along 
with  the  SNR  ratio  of  a  4th  power  PLL,  Ro.-tth,  which  is  given  in 
[6], 

♦  No  noise  — A — Eb/No=10dB 

— ©— EbyNo=5dB  — K—  Eb/No=0dB 


Figure  3.  Phase  Detector  Characteristics  of  the  NDA  PLL 


-3  -2  -1  0  1  2  3  4  5  6  7  8  9  10 

Eb/No  (dB) 

Figure  4.  SNR  Ratios  of  Phase  Detectors 


where  o  L  denotes  phase  variance  at  the  phase  detector  output. 

If  we  define  SNR  of  the  NDA  PLL  as  SNRPDjNDA,  the  ratio  of  the 
SNR  with  reference  to  SNRPD  L  is  given  by 
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4.  SIMULATION  RESULTS  AND 
DISCUSSION 


Based  on  the  analysis  in  [5],  phase  variance  a  20  L  of  a  second 
order  PLL  with  linear  phase  detector  is  given  by 


-S .  } 

*(l+OJ 


(8) 


Phase  variance  of  a  4th  power  PLL  and  the  NDA  PLL  can  be 
estimated  using  the  SNR  ratios  obtained  in  the  previous  section. 
Including  the  calculated  tracking  error  performances  for  the 
linear  PLL,  4lh  power  PLL,  and  NDA  PLL  by  a0L, 


0o,i./VRo,4th  -  3,1(1  a  o.l /VRo.nda  >  respectively,  Figures  5  and  6 

illustrate  the  simulation  results  on  the  steady-state  phase  tracking 
error  performance  of  the  NDA  PLL.  At  higji  SNR  the  simulated 
NDA  PLL  performance  closely  matches  the  calculated 
performance  based  on  the  SNR  ratio.  At  low  SNR  the  NDA 
performance  becomes  worse  than  the  calculated  based  on  the 
SNR  ratio.  It  is  due  to  the  loop  threshold171’  clmpter  6,  that  is, 
occurrence  of  cycle  slips.  The  phase  detector  characteristics  of 
the  NDA  PLL  are  similar  to  the  4th  power  PLL.  The  Nth  power 
PLL  elevates  loss  of  lock  by  approximately  20  Iog|0N  dBf4!l  Chap,CT 
".  Assuming  that  the  loop  threshold  point  of  a  second  order  PLL 
is  8  dB  as  illustrated  in  Chapter  6  of  [7],  loop  threshold  of  the 
NDA  PLL  becomes  20  dB,  or  a  phase  error  standard  deviation  of 
5.7  degrees.  From  Figures  5  and  6,  comparing  the  simulated 
results  of  the  NDA  PLL  with  the  calculated  ones,  it  is  noted  that 
the  threshold  approximately  agrees  to  the  5.7  degrees  point. 


— X—  The  NDA  PLL  (Simulated  Results) 

—©—The  NDA  PLL  (Calculated  based  on  SNR  Ratio) 
A  4th  Power  PLL 


Eb/No  (dB) 

Figure  5.  Steady-state  Tracking  Performance  (Rs/fn=400, 
£=0.707,  no  frequency  offset) 


0123456789  10 


Eb/No  (dB) 

Figure  6.  Steady-state  Tracking  Performance  (Rs/fn=100, 
£=0.707,  no  frequency  offset) 

Now,  consider  initial  acquisition  performance.  During  initial 
acquisition,  a  second-order  PLL  will  exhibit  a  phase  transient 
depending  on  the  initial  phase  error  and  frequency  offset.  When 
the  phase  transient  reaches  dynamic  range  of  the  NDA  PLL,  that 
is  approximately  'A  of  a  linear  PLL  or  jt/8  at  low  SNR,  the 
acquisition  process  is  disturbed  and  the  loop  might  go  to  a 
random  status.  So  the  initial  acquisition  transient  might  occur 
again.  Therefore,  in  the  NDA  PLL  design,  care  should  be  taken 
such  that  the  initial  phase  error  and  frequency  offset  should  be 
kept  sufficiently  small  for  the  loop  to  acquire  a  signal  within  a 
proper  time,  for  example,  a  burst  period.  The  following 
simulation  results  illustrate  that  it  is  feasible  to  design  the  NDA 
PLL  in  such  a  way  that  the  loop  properly  acquires  carrier  phase 
of  bursts  of  1 60  symbols  or  more  in  length. 

The  phase  error  performance  of  iterative  burst  processing  using 
the  NDA  PLL  is  simulated  and  the  results  are  illustrated  in 
Figures  7  and  8.  The  block  phase  estimation  method  in  [3]  is 
used  to  get  the  initial  value.  In  Figure  7,  the  PLL  parameters  of 
Rs/fn=200  and  £=0.707  are  used  for  the  initial  PLL  processing  in 
the  forward  direction.  Rs/f„=400  and  £=0.707  are  used  for  the 
subsequent  PLL  processing  in  the  backward  direction.  In  Figure 
8,  the  PLL  parameters  of  Rs/f„=100  and  £=0.707  are  used  both 
for  the  initial  PLL  processing  in  the  forward  direction  and  for  the 
subsequent  PLL  processing  in  the  backward  direction.  The 
simulation  results  show  that  the  same  performance  as  steady-state 
tracking  phase  error  can  be  obtained  when  the  carrier  frequency 
offset  is  small.  It  implies  that  for  the  given  parameters,  the  loop 
acquires  the  phase  at  the  initial  PLL  processing  in  the  forward 
direction. 
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—A — dF=0.004*Rs  dF=0.001*Rs  -B— dF=0 


Eb/No  (dB) 

Figure  7.  Simulated  Phase  Error  Performance  of  Iterative  Burst 
Processing  Using  the  NDA  PLL  (dF:  carrier  frequency  offset, 
burst  length=640  symbols) 


01  23456789  10 
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Figure  8.  Simulated  Phase  Error  Performance  of  Iterative  Burst 
Processing  Using  the  NDA  PLL  (dF:  carrier  frequency  offset, 
burst  length=160  symbols) 

As  the  burst  length  becomes  shorter,  it  becomes  harder  to  adjust 
the  PLL  parameter  Rs/f„  such  that  the  loop  can  properly  acquire 
the  phase  within  a  burst  period.  In  that  respect,  further  simulation 
results  indicate,  although  not  shown  here,  that  the  NDA  PLL  is 
applicable  to  a  burst  length  of  an  order  of  100  or  more. 

5.  CONCLUSION 

An  NDA  PLL  method,  which  recovers  the  carrier  phase  of 
QPSK/TDMA  bursts  without  preamble  by  iterative  processing,  is 
presented.  The  characteristics  of  the  phase  detector  in  the  loop 
are  examined  and  the  results  show  that  the  NDA  PLL  exhibits 
similar  performance  to  a  4th  power  PLL.  The  phase  error 
performance  was  simulated  and  the  results  indicate  that  the 
proposed  NDA  PLL  is  applicable  to  recovering  carrier  phase  of 
QPSK/TDMA  bursts  of  an  order  of  100  symbols  or  more  in 
length. 
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ABSTRACT 

Communication  systems  are  subject  to  stringent  image 
rejection  requirements.  Thus,  accurate  and  regular  field  cal¬ 
ibration  is  important.  Regression  techniques  are  effective  in 
the  calibration  of  quadrature  receiver  systems.  These  tech¬ 
niques  require  the  transmission  or  injection  of  a  calibra¬ 
tion  signal  to  estimate  potentially  frequency-dependent  er¬ 
rors.  Existing  regression-based  methods  use  nonlinear  mod¬ 
els  with  signals  that  calibrate  only  one  frequency  at  a  time. 
This  paper  recasts  the  problem  in  terms  of  linear  regression 
and  develops  an  optimized  multi-tone  calibration  signal  for 
quadrature  receiver  communication  systems.  Linear  regres¬ 
sion  ensures  closed-form  solutions  that  can  be  computed  in 
real-time  by  using  adaptive  filtering  techniques.  Simula¬ 
tions  demonstrate  the  advantages  of  the  multi-tone  signal: 
simultaneous  multi-frequency  calibration  and  minimal  in¬ 
terference  with  information  bearing  communication  chan¬ 
nels.  At  the  same  time,  the  benefits  of  regression-based 
calibration  are  also  realized:  modest  model  assumptions, 
effective  performance  assessment,  and  accommodation  of 
non-uniformly  sampled  or  missing  calibration  data. 

Keywords:  System  identification  and  calibration,  Signal 
processing  for  communications 

1.  INTRODUCTION 

Green  et  at.  developed  a  nonlinear  regression  (NLR)  -based 
method  to  calibrate  gain  and  phase  mismatch  between  the 
in-phase  (I)  and  quadrature  (Q)  branches  of  a  quadrature 
receiver  [4,  5],  The  method  is  effective  and  allows  reliable 
error  assessment.  Due  to  the  nonlinear  models,  however, 
real-time  implementation  can  be  difficult.  With  trigono¬ 
metric  manipulation  the  technique  can  be  recast  in  terms 
of  linear  regression  (LR).  The  technique  requires  either  the 
transmission  or  injection  of  a  calibration  signal  to  estimate 
potentially  frequency-dependent  errors.  The  use  of  a  trans¬ 
mitted  calibration  signal  is  particularly  advantageous  since 
this  permits  regular  field  calibration  of  the  receiver  without 
adding  complex  hardware. 

Figure  1  illustrates  a  standard  quadrature  receiver.  For 
simplicity,  the  antenna  is  assumed  to  be  omni-directional. 
The  gain  and  phase  errors  are  modeled  by  the  impulse  re¬ 
sponse  functions  hi  and  Hq,  for  the  I  and  Q  branches  respec¬ 
tively.  This  approach  allows  errors  to  be  frequency  depen¬ 
dent  as  shown  by  the  frequency-domain  representations  of 


the  response  functions:  Hi  (w)  =  Gi  ( u)exp(jipi  (w))  and 
Hq  (ui)  =  Gq  (w)exp  (jipQ  (w)).  Although  the  errors  are 
modeled  as  frequency  dependent,  it  is  reasonable  to  assume 
that  the  gain  and  phase  errors  are  approximately  constant 
over  narrow  frequency  bands. 


Figure  1:  I/Q  Receiver  with  Gain  and  Phase  Imbalances 

Although  other  calibration  procedures  exist  (see  [1], 
[10],  [8],  and  [7]),  regression-based  calibration  techniques 
provide  several  advantages  over  these  methods  including 
modest  model  assumptions,  allowance  of  non-uniformly 
sampled  or  missing  calibration  data,  effective  performance 
assessment,  and  model  flexibility.  Furthermore,  regression- 
based  techniques  can  calibrate  receivers  in  normal  operat¬ 
ing  environments.  That  is,  test  signals  are  transmitted  to 
receivers  according  to  field  design;  while  direct  test  signal 
injection  is  supported,  it  is  not  necessary. 

In  the  context  of  communication  systems,  several  LR 
properties  are  particularly  important.  First,  real-time  cal¬ 
ibration  must  be  accommodated.  Closed-form  solutions 
are  straight-forward  with  LR,  and  sequential  parameter  up¬ 
dates  are  possible  using  adaptive  filters.  Second,  the  capa¬ 
bility  to  calibrate  using  a  transmitted  signal  is  important 
to  avoid  the  restrictive  addition  of  hardware.  Third,  the 
ability  to  accurately  monitor  estimator  performance  is  ben¬ 
eficial.  Modern  mobile  communications,  for  example,  desire 
around  70  dB  of  image  rejection;  for  quadrature  receivers, 
this  corresponds  to  roughly  1/20°  of  allowable  phase  devi¬ 
ation  between  the  I  and  Q  branches.  System  performance 
is  monitored  using  parameter  inferences  and  confidence  in¬ 
tervals. 

In  [4]  and  [5],  two  calibration  signals  are  considered: 
a  pure  sinusoid  and  a  double  sideband  suppressed  carrier 
signal.  Unfortunately,  both  signals  suffer  from  the  signifi¬ 
cant  disadvantage  that  only  one  frequency  is  calibrated  at 
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a  time.  This  impedes  the  task  of  calibrating  a  receiver  over 
the  frequency  range  of  operation.  Practical  implementation 
requires  a  signal  that  admits  simultaneous  calibration  over 
multiple  frequencies.  Through  proper  selection  of  frequen¬ 
cies  and  optimization  of  relative  phase,  a  multi-tone  signal 
can  be  constructed  that  is  suitable  for  practical  regression- 
based  calibration. 

2.  LR-BASED  CALIBRATION 

Observations  that  follow  a  LR  model  are  expressed  accord¬ 
ing  to  Y  =  X0  +  e,  where  Y  is  the  N-by-1  vector  of 
observed  values;  X  is  the  known  N-by-P  predictor  matrix; 
(3  is  the  P-by-1  unknown  parameter  vector;  and  e  is  the  N- 
by-1  vector  of  additive,  zero-mean,  uncorrelated  noise.  The 
LR  model  does  not  require  a  particular  sampling  scheme. 
Thus,  non-uniformly  sampled  or  missing  data  do  not  affect 
the  method. 

Given  a  sufficient  number  of  observations,  unique 
closed-form  solutions  for  0  generally  exist,  although  prob¬ 
lems  such  as  multicollinearity  occasionally  arise.  Using  the 
solution  for  deterministic  least  squares,  the  unknown  pa¬ 
rameter  vector  is  estimated  according  to 

0  =  (X'X)<_1)  X'Y ,  (1) 

where  X'  is  the  matrix  transpose  of  X  and  designates 
the  matrix  inverse  operation. 

Equation  (1)  is  well  suited  for  post-processing  or  off-line 
calibration.  In  cases  where  real-time  calibration  is  desired, 
adaptive  filters  can  be  utilized.  Preliminary  results  using 
the  Least  Mean  Square  (LMS)  algorithm  and  the  Recur¬ 
sive  Least  Squares  (RLS)  algorithm  are  promising  [6],  Cur¬ 
rently,  gain  and  phase  errors  are  modeled  as  constants  with 
respect  to  time.  Although  additional  research  is  needed  to 
address  non-stationarities  such  as  component  drift,  meth¬ 
ods  such  as  the  Exponential  Forgetting  Window  (EFW)- 
RLS  algorithm  are  designed  just  for  such  conditions. 

There  exist  several  effective  methods  to  ascertain  LR  es¬ 
timator  performance.  Provided  independent,  normal  errors 
and  a  reasonable  sample  size,  a-level  confidence  intervals 

are  .  .  .. 

[/3p  —  t/f-pa  {0p)  ,  0p  +  tN-ps  {/3p}]  i  (2) 

where  tN_P  (1  -  a/2)  is  the  (1  -  a/2)  100  percentile  of  the 
student- f  distribution  with  (N  -  P)  degrees  of  freedom.  The 
sample  variance-covariance  matrix  of  0  is  given  by  s  (0)  = 

(V'  -  0'X ')  Y  ( X  X )(_1)  /(JV  -  P). 

In  communications  systems,  assumptions  about  e  are 
likely  violated.  For  example,  the  error  term  includes  sig¬ 
nal  content  from  the  communication  channels  themselves; 
and  these  signals  rarely  exhibit  time  independence  and  nor¬ 
mality.  Most  violations,  however,  have  minimal  impact  on 
estimation  since  data  sets  are  generally  large  by  conven¬ 
tional  standards.  Data  correlations  will  not  bias  estimates, 
but  may  erroneously  shrink  confidence  intervals.  In  these 
cases,  conservative  specifications  should  be  used.  Details 
are  in  [4]. 

Resampling  techniques,  such  as  the  jackknife  or  the 
bootstrap,  have  no  underlying  assumptions  regarding  sam¬ 
ple  size  or  normality  of  errors,  and  they  provide  effective 


alternatives  to  (2).  This  is  particularly  true  in  cases  where 
the  desired  parameters  are  functions  of  0  and  not  the  indi¬ 
vidual  terms  0P  themselves.  Details  for  jackknife  and  boot¬ 
strap  inferences  are  given  in  [4], 


3.  OPTIMIZED  MULTI-TONE  SIGNAL 


A  multi-tone  calibration  signal  m(t)  can  be  constructed  us¬ 
ing  a  superposition  of  real  sinusoids 

K 

m(t)  =  ^  A4 cos ( uJkt  +  Ok) ,  (3) 

fc=i 

where  Mk  and  Ok  establish  the  relative  magnitude  and  phase 
of  each  sinusoid  component  [3].  By  restricting  our  basis  set 
to  sinusoids,  frequency  content  is  easily  controlled.  To  sim¬ 
plify  calibration,  it  is  most  sensible  to  constrain  all  gains 
to  be  equal,  Mk  -  M  for  all  k.  The  final  value  M  is  cho¬ 
sen  to  reflect  the  desired  signal  power.  Since  the  power  of 
m(t)  is  distributed  over  frequency,  individual  components 
possess  relatively  low  power,  which  helps  minimize  channel 
interference  by  the  calibration  signal.  Construction  of  the 
calibration  signal,  then,  requires  determination  of  K ,  Ok, 
and  uik  for  all  k. 

The  number  of  calibration  frequencies  K  should  be  suf¬ 
ficient  to  cover  the  frequency  range  of  interest.  This  pa¬ 
rameter  will  vary  from  application  to  application.  In  some 
cases,  K  may  be  quite  large.  This,  in  turn,  can  cause  seri¬ 
ous  problems  in  dynamic  range  [3] .  From  (3)  it  is  clear  that 
the  worst  case  occurs  when  all  phases  are  zero,  Ok  =  0  for  all 
k.  The  resulting  signal  has  nearly  all  of  its  power  concen¬ 
trated  in  very  narrow  time  slots.  Even  distribution  of  signal 
power  over  time  improves  dynamic  range.  To  this  end,  the 
phase  terms  Ok  we  chosen  to  minimize  the  amplitude  of  the 
calibration  signal, 

min  ^max(|Tn(t)|)^  .  (4) 

The  phase  optimization  in  (4)  is  a  nonlinear  problem. 
To  compute  Ok,  define  mm ax  =  max (|tti(<)|)  and  let  tmax 

be  the  time  when  this  maximum  occurs.  That  is,  mmxx  — 

}m(trnax)|* 

From  (3),  the  first  and  second  partial  derivatives  of 
m(tmax)  with  respect  to  0k  are  given  by 

9mi =  -Mk  Sin  (Wfctmax  +  0k)  (5) 

oOk 

and 

9  =  -Mk  cos  (w*t„ ax  +  Ok)  (6) 

°“k 

When  set  equal  to  zero,  (5)  identifies  critical  points.  Equa¬ 
tion  (6)  determines  whether  these  points  are  maxima,  min¬ 
ima,  or  points  of  inflection. 

To  simplify  computation  of  the  first  two  partial  deriva¬ 
tives  of  rnmax,  (5)  and  (6)  are  combined  using  Euler’s  iden¬ 
tity 


d  771  max  ,  .  ^771max 

del  +J  dOk 


-sgn(m(tm*x))Mkej{uktm'*+ek). 

(7) 
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Suitable  Ok  are  computed  using  gradient-descent  type  algo¬ 
rithms,  which  are  made  efficient  by  using  (7). 

It  is  critical  that  the  frequencies  u>k  are  chosen  so  that 
the  resulting  calibration  signal  is  periodic.  That  is,  w*  must 
allow  m(t)  =  m(t  4-  T)  for  all  t  and  at  least  one  finite  value 
of  T.  Without  periodicity,  phase  optimization  of  the  cali¬ 
bration  signal  is  meaningless,  and  the  resulting  calibration 
signal  will  necessarily  display  “bad”  behavior.  One  simple 
way  to  ensure  that  the  calibration  signal  is  periodic  is  to 
choose  muii  —  rijujj  for  i  =  1, 2, ....  K  and  j  =  1,2, . . . ,  K 
and  where  n,  and  rij  are  integers.  It  is  also  convenient  to 
restrict  uJk  >  0  for  all  k.  In  some  cases,  I/Q  mismatch  varies 
slowly  with  frequency.  In  these  cases,  it  is  sensible  to  place 
u)k  between  information-bearing  frequency  bands.  This  will 
help  minimize  interference  between  the  calibration  signal 
and  the  communication  channels. 

Quadrature  receivers  operate  by  mixing  the  received  sig¬ 
nal  with  a  local  oscillator  of  frequency  u>lo,  as  shown  in 
Figure  1.  This  operation  frequency  shifts  the  received  sig¬ 
nal.  Thus,  it  is  necessary  to  translate  the  frequencies  w* 
of  the  optimized  signal  m(t )  by  |wlo|  prior  to  transmission. 
That  is,  the  transmitted  signal  is  obtained  from  3  by  replac¬ 
ing  u>k  with  u>k  =  Uk  +  |cjlo|  for  all  k.  This  ensures  that 
the  calibration  signal,  once  demodulated,  maintains  the  op¬ 
timized  signal  with  content  at  the  desired  frequencies  w*,. 
Optimization  of  m(t)  should  not  be  done  using  Hik  since  fre¬ 
quency  translation  during  demodulation  otherwise  disrupts 
the  phase  optimization. 


4.  SYSTEM  MODEL 

The  optimized  multi-tone  test  signal  m(t)  is  well  suited  for 
LR-based  quadrature  receiver  calibration.  Although  m(f)  is 
known  for  a  given  application,  the  received  signal  m(t)  has 
unknown  gain  G  and  unknown  phase-shift  ip,  which  corre¬ 
spond  to  channel  attenuation  and  independent  operation  of 
the  receiver’s  local  oscillator,  respectively. 

Following  analog-to-digital  conversion,  the  received  sig¬ 
nal  is  given  by 


m  ( t )  = 

K 

[llX\  (t,UJk)Pl(Uk)  +  Il  X2(t,  U>k)p2(ijJk)+ 

k= 1 

IqX\  ( t ,  Wfc)/?3(wfc)  -I-  Ic)X2{t,  (i>k)/3i(uJk)]  +  £(t)  1  (8) 

where  X  are  the  predictor  variables,  0  are  the  unknown 
parameters,  system  noise  is  designated  by  e  (t),  and  each 
I  serves  as  an  indicator  function.  The  indicator  functions 
identify  observations  from  different  branches  and  improve 
estimator  performance  [4,  5).  In  this  case,  7/  =  1  and  Iq  = 
0  when  representing  data  from  the  in-phase  branch;  /;  =  0 
and  7q  =  1  when  representing  data  from  the  quadrature 
branch. 

The  unknown  parameters  are  estimated  using  (8)  and 
the  LR  procedure  described  in  Section  2.  Recommenda¬ 
tions  in  [4]  regarding  system  sampling  should  be  followed 
to  ensure  good  parameter  estimates.  Essentially,  the  u>kt 
samples  modulo  27rl,  where  l  is  any  integer,  should  not  be 
tightly  clustered. 


In  (8),  the  predictor  variables  are  given  by 

Xi  ( t ,  Wfc)  =  cos(a;  kt  4-  Ok) 

X2  (t.wfe)  =  sin  (ujkt  +  0k),  (9) 

and  the  unknown  parameters  are  given  by 

Pi  (u>k)  =  Gi(u>k)cos(ai(uk)) 

02(vk)  =  Gf(oJk)  sin(a/(w*)) 

0a(vk)  =  Gq (u>k)  sin(aQ(tj*)) 

0i{uJk)  =  —Gq  (aifc )  cos(qq  (u>fc ) ) .  (10) 

Here,  the  grouped  gain  parameters  Gi(uJk)  and  Go(wfc)  in¬ 
clude  contributions  from  the  unknown  test  signal  gain  as 
well  as  the  unknown  frequency-dependent  gains  of  each  in¬ 
dividual  branch.  Similarly,  the  grouped  parameters  cti(oJk) 
and  ag(ut)  represent  phase  contributions  from  the  test  sig¬ 
nal  as  well  as  the  receiver.  Individual  gain  and  phase  terms 
are  not  important;  rather,  it  is  only  the  relative  mismatch 
between  branches  as  a  function  of  frequency  that  is  impor¬ 
tant. 

Using  the  I  branch  as  reference,  the  relative  gain  mis¬ 
match  as  a  function  of  frequency,  Grei(wjt),  is  given  by 


Gre\(u>k)  — 


0%(uJk)  +  0l(vk) 
0\{uk)  +  0l{i»k)' 


(11) 


and  the  relative  phase  mismatch  as  a  function  of  frequency, 
Qftei(wfc),  is  given  by 


«rei(u>fc)  =  arctan 


—  arctan 


■  (12) 


As  alluded  to  earlier,  although  confidence  intervals  for  0 
are  straightforward,  confidence  intervals  for  Grei (uJk)  and 
c*r»i(wfc)  are  complicated.  Bootstrap  and  jackknife  meth¬ 
ods  are  effective,  but  they  are  computationally  inefficient. 
Thus,  practical  real-time  computation  of  confidence  inter¬ 
vals  or  inferences  needs  further  investigation. 


5.  SIMULATIONS  AND  RESULTS 

Computer  simulations  help  demonstrate  the  effectiveness  of 
LR-based  calibration  using  an  optimized  multi-tone  signal. 
First,  an  optimized  calibration  signal  is  constructed  with 
u>k  =  250(27rfc)  for  k  =  {1, 2, 3, ... ,  16}.  This  signal  allows 
calibration  up  to  a  frequency  of  around  4-kHz.  Figure  2 
plots  two  periods  of  this  signal.  Although  the  optimization 
procedure  does  not  produce  a  unique  solution,  the  desired 
temporal  distribution  of  signal  energy  is  achieved. 

For  the  first  calibration  scenario,  consider  the  simple 
case  when  the  only  errors  present  are  a  deviation  from  the 
unit-gain  quadrature  state.  Using  the  in-phase  branch  as 
reference,  this  is  equivalent  to  the  injection  of  gain  and 
phase  errors  Glo  and  ipLO  into  the  quadrature  branch.  The 
unknown  parameters  0  are  therefore  not  functions  of  fre¬ 
quency,  and  the  number  of  unknowns  is  reduced  from  AK 
to  four.  Simple  modification  of  (8)  accommodates  this  fact. 

Prior  to  transmission,  each  component  in  m(t )  is  fre¬ 
quency  shifted  by  40-kHz,  to  match  the  carrier  and  local 
oscillator  frequencies.  A  Double  Side-Band  Suppressed  Car¬ 
rier  (DSB-SC)  voice  signal,  band-limited  to  10-kHz  is  also 
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Figure  2:  Optimized  Calibration  Signal 


included.  According  to  (8),  the  voice  signal  is  viewed  as 
noise  e(t),  and  the  Signal-to-Noise  Ratio  (SNR)  is  unity. 

The  received  signal  is  split  into  I  and  Q  paths,  mixed, 
and  then  lowpass  filtered  using  a  12(,l-order  equiripple  FIR 
filter  with  10-kHz  cut-off  frequency  and  70-dB  of  stopband 
attenuation.  The  Q  branch  is  given  a  gain  error  of  Glo  = 
0.95  and  a  phase  deviation  of  V’lo  =  tt/3  radians.  Receiver 
sampling  is  set  to  22.05-kHz. 

Using  a  total  of  39,588  samples  per  branch,  the  param¬ 
eters  0  are  estimated  using  (1).  Following  transformation 
by  (11)  and  (12),  the  gain  and  phase  error  estimates  are 
Glo  =  0.9498  and  rpio  =  1  0296.  This  gives  relative  errors 
of -0.023  percent  and  -1.68  percent,  respectively. 

Although  the  correlated  nature  of  voice  violates  the  i.i.d. 
assumption  typical  for  error  terms  in  LR  models,  the  cali¬ 
bration  procedure  produces  good-quality  estimates.  This  is 
particularly  encouraging  given  the  low  SNR  and  relatively 
short  data  record  of  under  two  seconds. 

The  second  calibration  scenario  is  similar  to  the  first.  In 
this  case,  however,  randomly  chosen  frequency-dependent 
errors  0{oik)  are  present,  the  LR  model  error  e{t)  is  i.i.d. 
Gaussian  noise,  and  the  SNR  is  10  dB.  Figure  3  summarizes 
the  relative  errors  of  the  estimates.  I/Q  mismatch  is  well 
estimated  across  the  frequencies  ui*  with  mean  relative  error 
of  around  one  percent. 

6.  CONCLUSIONS 

An  optimized  multi-tone  signal  was  developed  for  LR-based 
calibration  of  gain  and  phase  mismatch  in  I/Q  receivers. 
This  signal  permits  simultaneous  multi-frequency  calibra¬ 
tion  and  causes  minimal  interference  with  information  bear¬ 
ing  communication  channels.  LR  estimates  have  closed- 
form  solutions  and  permit  sequential  updating  using  adap¬ 
tive  filters.  At  the  same  time,  the  benefits  of  the  regression- 
based  calibration  are  also  realized:  modest  model  assump¬ 
tions,  effective  parameter  inferences,  and  accommodation 
of  non-uniformly  sampled  or  missing  calibration  data.  Sim¬ 
ulations  document  the  effectiveness  of  the  procedure. 


|2«K 


Figure  3:  Relative  Gain  and  Phase  Errors 
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ABSTRACT 

The  estimation  of  the  delay  of  a  known  training  signal  re¬ 
ceived  by  an  antenna  array  in  a  multipath  channel  is  ad¬ 
dressed.  The  effect  of  the  co-channel  interference  is  taken 
into  account  by  including  a  term  with  unknown  spatial  cor¬ 
relation.  The  channel  is  modeled  as  an  unstructured  FIR 
filter.  The  exact  maximum  likelihood  (ML)  solution  for 
this  problem  is  derived,  but  it  does  not  have  a  simple  de¬ 
pendence  on  the  delay.  An  approximate  estimator  that  is 
asymptotically  equivalent  to  the  exact  one  is  presented.  Us¬ 
ing  an  appropriate  reparameterization,  it  is  shown  that  the 
delay  estimate  is  obtained  by  rooting  a  low-order  polyno¬ 
mial,  which  may  be  of  interest  in  applications  where  fast 
feedforward  synchronization  is  needed. 

1.  INTRODUCTION 

Time-delay  estimation  or  timing  synchronization  is  a  key 
task  in  diverse  areas,  such  as  radar,  sonar  and  commu¬ 
nications.  Accurate  chip/symbol  synchronization  is  espe¬ 
cially  important  in  systems  employing  time-division  multi¬ 
ple  access  (TDMA)  or  asynchronous  burst  transmissions. 
Also,  most  multiuser  detectors  for  code-division  multiple 
access  (CDMA)  require  reliable  estimates  of  the  users’  code 
timings  in  order  to  operate  acceptably  in  near-far  environ¬ 
ments  [1],  In  addition,  Global  Navigation  Satellite  Systems 
(GNNS)  arouse  great  interest  at  present.  In  these  systems, 
accurate  time-delay  estimation  is  fundamental,  since  it  is 
the  key  to  obtain  sub-meter  accuracies  in  location  esti¬ 
mates. 

There  is  a  vast  literature  on  single  antenna  synchro¬ 
nization  methods  for  both  additive  white  Gaussian  noise 
(AWGN)  channels  and  multipath  channels  [2],  However, 
the  performance  of  these  methods  is  limited  when  strong 
co-channel  interference  (CCI)  is  present.  For  this  reason, 
an  important  effort  is  being  conducted  to  derive  time-delay 
estimators  that  make  efficient  use  of  antenna  arrays  in  inter¬ 
ference  limited  scenarios.  Following  an  approach  that  has 
already  been  applied  successfully  to  this  and  other  prob¬ 
lem,  all  the  components  contributing  to  the  noise  and  CCI 
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(i.e.,  multi-access  interference  (MAI),  external  interference, 
etc.)  are  modeled  as  a  Gaussian  term  with  unknown  and 
arbitrary  spatial  correlation  matrix  [3,  4,  5].  This  model 
allows  us  to  develop  a  metric  that  takes  the  spatial  char¬ 
acteristics  of  the  CCI  into  account,  and  we  believe  that 
this  offers  an  excellent  trade-off  between  model  realism  and 
computational  complexity.  A  more  detailed  description  of 
the  CCI,  e.g.  using  the  finite  alphabet  property  of  the  MAI, 
may  result  in  an  improved  performance,  but  at  the  expense 
of  avoiding  a  simple  expression  for  the  estimator.  On  the 
other  hand,  several  computationally  attractive  algorithms 
have  been  derived  under  the  usual  assumption  that  the  CCI 
is  spatially  white  [6].  However,  the  resulting  algorithms  are 
not  suited  for  situations  involving  strong  CCI.  The  estima¬ 
tion  of  the  spatial  covariance  of  the  CCI  is  only  possible 
if  a  training  sequence  is  received.  Therefore,  the  estimator 
presented  herein  and  those  that  assume  the  same  model 
for  the  CCI  can  only  operate  in  data-aided  or  decision- 
directed  mode.  The  assumption  that  the  signal  shape  is 
known  should  not  be  a  too  stringent  one,  since  most  com¬ 
munications  or  satellite  navigation  systems  transmit  certain 
training  sequences  and,  subsequently  the  estimator  can  be 
switched  to  a  decision-directed  mode.  In  addition,  in  radar 
and  sonar  systems  the  shape  of  the  received  signal  coincides 
with  that  of  the  transmitted  one. 

Some  methods  have  focused  on  determining  the  time- 
delays  of  the  multipath  components  together  with  some 
other  parameters,  for  instance  the  directions  of  arrival  (DOA), 
describing  the  channel  [7,  8,  9].  Those  methods  exploit  the 
full  space-time  structure  of  the  multipath.  Except  for  some 
cases  that  resort  to  a  particular  configuration  of  the  an¬ 
tenna  array,  the  primary  drawback  of  these  approaches  is 
that  complicated  search  procedures  are  required  to  estimate 
the  desired  parameters.  Moreover,  when  DOA  estimates 
are  to  be  estimated,  it  is  necessary  to  have  a  calibrated  an¬ 
tenna  array,  which  is  a  restrictive  assumption.  For  these 
reasons,  we  will  use  an  unstructured  model  for  the  chan¬ 
nel.  Although  this  leads  to  an  increase  in  the  number  of 
unknowns  with  respect  to  a  parameterized  model,  the  de¬ 
pendence  on  the  channel  is  linear  and  it  can  be  estimated 
in  closed  form,  as  in  [3]. 

A  number  of  techniques  that  estimate  the  delays  of  each 
of  the  received  replicas  and  assume  that  the  value  of  each 
delay  is  arbitrary  have  been  developed.  However,  receivers 
usually  combine  the  different  rays  of  the  received  signal  us¬ 
ing  a  RAKE  structure.  This  structure  can  be  viewed  as  a 
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bank  of  filters,  with  a  fixed  delay  (typically  the  inverse  of 
the  bandwidth  or  a  fraction  thereof)  between  each  pair  of 
filters.  Hence,  it  seems  logical  to  extend  this  structure  also 
to  the  timing  synchronization.  The  method  proposed  in  this 
paper  is  based  on  the  estimator  in  [4,  5].  The  difference  lies 
in  that  now  a  fixed  separation  between  the  received  replicas 
is  assumed,  thereby  only  the  absolute  delay  of  the  whole  set 
of  replicas  has  to  be  estimated.  The  interesting  consequence 
of  this  model  is  that  the  delay  can  be  obtained  by  finding 
the  roots  of  a  low-order  polynomial,  for  which  computation¬ 
ally  efficient  algorithms  exist.  Consequently,  the  method  is 
specially  tailored  to  applications  where  fast  (feedforward) 
synchronization  of  the  received  signal  is  needed  [2]. 

2.  DATA  MODEL 

The  signal  received  by  an  arbitrary  m  element  array  is  mod¬ 
eled  as 

d 

y  M  =  s  (nTs  -  T  -  (k  -  1)  To)  +  e[n]  (1) 

fc=i 

where  Ts  is  the  sampling  period,  {h*,}  are  the  FIR  chan¬ 
nel  coefficient  vectors  and  d  is  the  temporal  length  of  the 
channel.  To  is  the  temporal  spacing  of  the  FIR  channel  and 
can  be  freely  chosen,  together  with  d,  when  setting  up  the 
model  (1).  The  transmitted  signal  s  (t)  is  assumed  to  be 
known  to  within  the  scalar  time  delay  parameter  r.  If  A 
samples  are  collected,  they  all  may  be  grouped  together  as 
follows: 

Y=[y[l]  y [2]  ...  y [N]  ]  =  HS (r)  +  E  (2) 

where  E  is  formed  identically  to  Y  and 

H=[hi  h2  ...  hj]  (3) 

The  m  x  d  matrix  H  represents  the  single-input-multiple- 
output  (SIMO)  channel  for  the  signal  of  interest.  The  p,q- 
th  element  of  the  matrix  S  (r)  is  s  (qT,  —  r  —  {p  —  1)  To). 
The  term  e[n],  which  gathers  the  noise  and  all  other  CCI, 
is  modeled  as  a  complex,  circularly-symmetric,  zero-mean 
Gaussian  process.  It  is  assumed  to  be  temporally  white 
and  spatially  colored  with  an  arbitrary  unknown  correlation 
matrix: 

£  {e  [n]  e*  [m]}  =  Q  5n,m  (4) 

where  (•)*  denotes  the  complex  conjugate  transpose  oper¬ 
ation.  While  such  a  model  for  e  [n]  is  clearly  only  approx¬ 
imate,  it  captures  the  most  significant  effects  of  the  noise 
and  interference,  and  leads  to  tractable  algorithms.  For 
the  asymptotic  results  in  the  next  section  to  be  valid,  the 
following  additional  assumption  is  needed:  s(t)  is  a  band- 
limited  finite-average-power  signal,  and  the  sampling  period 
Ts  satisfies  the  Nyquist  criterion. 

Note  that  though  the  pulse  shaping  filter  could  be  fac¬ 
tored  into  the  channel  matrix,  as  in  [8],  we  assume  herein 
that  the  elements  of  S  (r)  are  samples  of  the  continuous 
modulated  waveform  s  ( t ).  As  such,  the  matrix  H  only 
describes  the  propagation  effects  of  the  channel,  and  r  is 
a  continuous-valued  parameter.  This  modeling  premise  is 


different  to  those  usually  taken  in  other  work  addressing 
the  equalization  of  FIR  channels  rather  than  the  synchro¬ 
nization. 

The  model  in  (1)  is  closely  related  to  that  employed 
in  other  methods  that  attempt  to  estimate  the  delays  of 
the  different  arrivals,  such  as  [7,  6,  4,  5].  In  those  cases, 
the  received  vector  model  consists  of  the  contribution  of  L 
arrivals  as  follows 

Y  =  AS  (r)  +  E  (5) 

where 

r  =  [  n  ...  tl  ]T  A  =  [  ai  ...  a L  ]  (6) 

s(ri)  =  [  s(Ts  —  ti)  ...  s(NTs—ti)  ]  (7) 

S  (t)  =  [  sT  (n)  ...  sT  (tl)  ]T  •  (8) 

The  columns  of  the  matrix  A  are  the  spatial  signatures  of 

the  different  arrivals.  Assuming  that  the  signal  s  ( t )  is  band- 
limited  and  To  satisfies  the  Nyquist  criterion,  each  row  of 
the  matrix  S  (r)  can  be  expressed  as  a  linear  combination 
of  the  elements  of  S  (r)  [10].  Therefore,  there  exist  a  L  x  d 
interpolating  matrix  T  that  satisfies 

S  (r)  =  TS  (r) .  (9) 

For  the  equality  (9)  to  be  exact  in  a  general  case,  the  column 
dimension  of  T,  that  is  d,  should  be  infinite.  However, 
very  good  approximations  can  be  obtained  for  finite  d  [10]. 
Finally,  identifying  the  channel  matrix  as  H  =  AT,  the 
relationship  between  the  models  in  (1)  and  (5)  becomes 
apparent. 

3.  MAXIMUM  LIKELIHOOD  ESTIMATOR 
AND  ASYMPTOTICALLY  EQUIVALENT 
APPROXIMATION 

Under  the  model  described  above,  the  negative  log-likelihood 
function  of  the  data  Y  in  (2)  is  given  by  (to  within  irrelevant 
constants)1 

A(r,  H,  Q)  =  log  |Q|  +  Tr  {C(r,  H)  Q-1}  ,  (10) 

where 

C(r,  H)  =  Ryy  -  HR;s(t)  -  R„5(t)H*  +  HR„(r)H* 

R,»4YY‘  iUr)  =  iYS*(r)  (11) 

Rss(r)  =  ^S(r)S*(r).  (12) 

Since  H  and  Q  are  taken  as  unstructured  deterministic  ma¬ 
trices,  the  minimization  of  (10)  may  be  performed  explicitly 
with  respect  to  them.  Their  ML  estimates  may  be  expressed 
as 

H  ml(t)  =  R#3(r)R-1(r)  (13) 

Qiwt(r)  =  Ryj/ —  Rj(5('>")Rss  (1")R-j/s(7’)  •  (14) 

1 1  -  |  and  Tr{-}  denote  the  determinant  and  the  trace  of  a 
matrix,  respectively. 
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Ignoring  parameter  independent  constants,  the  resulting 
ML  criterion  for  r  is  the  minimizing  argument  of 

/(r)  =  log|I-B(r)|  (15) 

where 

B(r)  =  1  Aj  Y  Ps*(t)  Y*  R  -  *  (16) 

Ps.M  =  S*(r)(S(r)S*(r))-1S(r)  .  (17) 

If  the  noise  had  been  assumed  spatially  white  ( i.e .  Q  = 
<x2I),  the  ML  cost  function,  in  place  of  (15),  would  have 
been 

r(r)  =  - ^{Yps.(r)Y*}  ■  (18) 

Since  the  dependence  of  this  cost  function  on  the  projection 
matrix  Ps*  is  linear,  the  algorithm  presented  in  the  next 
section  could  be  applied  in  order  to  find  the  minimum  of 
(18)  by  rooting  a  polynomial.  This  interesting  algorithm 
is  not  directly  applicable  to  (15)  because  of  the  determi¬ 
nant  operator.  However,  based  on  the  results  of  [4,  5],  it  is 
straightforward  to  build  a  cost  function  that  is  linear  in  the 
projector  Ps*  and  yields  asymptotically  (large  N,  through¬ 
out  the  paper)  equivalent  estimates  to  those  provided  by 
/(r).  This  alternative  cost  function  to  be  minimized  is 

ff(r,w)  =  — Tr{wB(r)}  (19) 

The  weighting  matrix  W  is  computed  as 

W  =  (I  —  B  (f)  )-1  (20) 

where  f  is  a  consistent  estimate  of  the  true  delay.  This  ini¬ 
tial  consistent  estimate,  that  is  when  a  previous  estimate 
is  not  available  to  compute  the  weighting  matrix,  is  simply 
obtained  as  the  minimizing  argument  of  g  (r,  I).  Following 
the  development  in  [4,  5],  it  can  be  shown  that  both  (15) 
and  (19)  provide,  under  mild  conditions  and  in  absence  of 
modeling  errors,  consistent  and  asymptotically  efficient  esti¬ 
mates.  Note  that  it  can  be  argued  that  since  N  is  the  length 
of  the  training  sequence,  we  will  never  reach  asymptotics  in 
N.  However,  the  discussion  above  is  completely  meaning¬ 
ful  because  the  numerical  results  show  that  the  asymptotic 
behaviour  is  reached  for  rather  modest  sample  sizes.  It  is 
not  difficult  to  show  that  the  CRB  for  the  problem  at  hand 
is 

CRB-1  (r)  =  2iv|  (D  (t)  P±  m  D*  (t))  (H*  Q1  H)  }  , 

where  the  matrices  S  and  D  (which  is  the  derivative  of  the 
former)  are  evaluated  at 

r  =  [  r  r  +  T0  ...  r  +  (d— 1)T0  ]T  . 

4.  POLYNOMIAL  ROOTING  APPROACH 

At  this  point  we  are  concerned  with  the  minimization  of  the 
following  general  expression 

g  (r,  W)  =  - ipTr  {  W1/2  R-„1/2  Y  Ps.  (r)  Y*  R-1/2  W1/2  }  . 

(21) 


For  appropriate  choices  of  W,  this  expression  represents 
the  asymptotically  efficient  and  the  consistent  estimators 
for  correlated  noise  (g(r,  W)  and  g  (r,  I)),  and  the  white- 
noise  ML  estimator  ( fw  (r)).  Now,  the  N  temporal  samples 
are  transformed  into  the  frequency  domain  using  the  DFT, 
so  that  the  signal  approximately  satisfies  the  following  re¬ 
lationship2 

S»  =  S:V(r)  (22) 

where  Sw  is  a  diagonal  matrix  whose  entries  are  the  DFT 
of  the  samples  [s  (Ts) ,  •  •  •  ,s(NT3)],  and 

V(r)=[v(r)  v(r  +  T0)  •••  v  (r  +  {d  -  1)  T0)  ] 

v(r)  =  [  exp(jwir)  •••  exp(jWr)]T  (23) 

Ui  =  wk  (*-1-fl°or(f))  •  (24) 

The  criterion  in  (21)  may  be  expressed  as  a  function  of 
x  =  exp  (j2nr/NTs  ),  resulting  in  a  polynomial  in  x  of  or¬ 
der  2 N  —  2,  since  (V*(r)Su,S^V(r)  does  not  depend  on 
x.  This  approach  lacks  of  interest  because  N  is  generally 
large.  Below  we  describe  a  method  that  leads  to  the  rooting 
of  polynomials  of  order  2d,  and  it  is  natural  that  d  <  IV. 


Let  the  elements  of  the  vector  p  =  [p0  •  ■  •  pd\‘  be  taken 
from  the  coefficients  of  the  polynomial 

p(z)  =p0zd  +Pl  zd~l  +  ■■■  +pd  (25) 

whose  roots  are: 

{r°a:,r1  a:,- •  •  ,r{d~l)x}  ,  (26) 

where  r  =  exp  (j2irT0/NTs).  If  we  define 

F  =  sr  Y‘R;/ wVv'N  (27) 

*=  (P-srs^P)"1  (28) 

and  build  the  N  x  N  —  d  matrix 

Pd  Pd- 1  •  •  '  Po  0  ]  * 

0  Pd  Pd- 1  po  0 

P=  .  ,  (29) 

0 

0  pa  -  Pd-i  ■■■  po 


then  minimizing  (21)  is  equivalent  to  minimizing  [6] 

9(p,W)  =  Tr{F*P’FP*F}  .  (30) 

It  can  be  readily  shown  that  the  vector  p  satisfies 

p  =  K  t  (x)  (31) 

where  K  is  diagonal  matrix  whose  elements  are  the  coeffi¬ 
cients  of  the  polynomial  p(z)  for  the  case  x  =  1,  and 

t  (x)  =  [  1  x  •  •  •  xd  ]T  .  (32) 

2 The  same  notation  is  used  for  both  the  time  and  frequency 
domains  because  the  DFT  is  an  unitary  transformation  and, 
therefore,  the  estimators  presented  in  section  3  are  identically 
applicable  in  the  frequency  domain. 
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Therefore,  if  the  term  ’i'  is  held  fixed,  the  cost  function  in 
(30)  can  be  written  as  a  polynomial  in  x  of  order  2d,  as 
follows: 

g  (x,  W)  =  tT  (1/x)  K*  C  K  t  (x)  (33) 

for  some  matrix  C  obtained  from  F  and  4/ .  For  sake  of  the 
brevity  the  explicit  form  of  C  is  omitted.  The  minimum 
of  g(x,  W)  on  the  unit  circle  is  computed  by  first  finding 
the  roots  of  its  derivative.  Next,  (33)  is  evaluated  at  the 
set  of  roots  that  lie  on  the  unit  circle,  and  the  one  giving 
the  minimum  is  selected.  Using  this  root  and  the  definition 
of  x ,  the  delay  estimate  is  easily  obtained.  This  procedure 
is  repeated  until  convergence  or  failure  conditions  are  sat¬ 
isfied  ( e.g .,  in  the  simulations  these  conditions  are:  change 
in  x  smaller  than  1CT4,  number  iterations  larger  than  50). 
At  each  iteration,  the  matrix  'F  is  recomputed  using  the 
previous  estimate  of  x\  and  in  the  first  iteration,  4>  is  taken 
equal  to  the  identity. 

An  essential  feature  of  this  algorithm  is  that  the  inverse 
matrix  operation  needed  in  the  computation  of  'S’  needs  to 
be  calculated  only  once,  and  it  can  be  done  off-line  since 
the  matrix  to  be  inverted  depends  exclusively  on  some  de¬ 
sign  parameters.  This  follows  from  the  fact  that  tF  can  be 
decomposed  as 

*=U*iU_I  (34) 

where  '$ri  is  the  value  of  th  for  x  =  1,  and 

U  =  diag  |l,  x,  . . . ,  xN~d+1 1  .  (35) 

Therefore,  the  update  of  'F  at  every  iteration  only  involves 
the  left  and  right-hand  product  of  a  fixed  matrix  by  a  di¬ 
agonal  one  that  solely  depends  on  x. 

5.  NUMERICAL  RESULTS 

We  analyze  the  performance  of  the  estimators  proposed  in 
this  paper,  and  compare  it  with  the  Cramer-Rao  Bound 
(CRB).  Specifically,  we  consider  the  exact  ML  estimator 
for  the  colored-noise  case  given  by  (15),  its  approximation 
in  (19)  and  the  ML  estimator  for  the  white-noise  case  in 
(18).  The  cost  function  of  the  first  one  is  minimized  by 
means  of  a  search.  Whereas,  the  polynomial  rooting  algo¬ 
rithm  in  section  4  is  applied  to  the  latter  two.  In  the  case 
of  the  approximate  ML  estimator,  we  have  chosen  to  up¬ 
date  the  matrix  W,  together  with  'S’,  at  each  iteration  of 
the  algorithm.  The  RMSE  (root  mean  square  error)  are 
computed  from  500  Monte  Carlo  realizations. 

We  concentrate  on  a  scenario  where  L  =  2  delayed  ar¬ 
rivals  of  a  known  signal  are  received  by  a  uniform  linear 
array  with  6  antennas  spaced  0.5A  apart.  This  known  sig¬ 
nal  is  a  concatenation  of  K  truncated  and  sampled  Nyquist 
square  root  raised  cosine  pulses.  Each  pulse  has  a  band¬ 
width  equal  to  (l  +  a)/2Tc,  is  truncated  to  the  interval 
[— 5TC,  5TC],  and  the  sampling  period  is  Ts  =  Tc/2 ,  so  there 
are  21  samples  in  each  pulse.  The  roll-off  factor  is  set  equal 
to  a  =  0.2.  The  use  of  this  type  of  signal  is  of  interest  be¬ 
cause  each  pulse  may  represent  the  output  of  the  despreader 
at  every  symbol  period  in  a  direct-sequence  CDMA  system. 
For  simplicity,  the  spatial  signatures  of  the  two  arrivals  are 


the  array  steering  vectors  for  DOAs  equal  to  0°  and  10°  rel¬ 
ative  to  the  broadside.  The  noise  plus  interference  field  in 
which  the  array  operates  consists  of:  i)  spatially  and  tem¬ 
porally  white  Gaussian  noise,  and  ii)  a  temporally  white 
Gaussian  interference  at  DO  A  —30°.  The  remaining  sce¬ 
nario  parameters,  except  when  one  of  them  is  varied,  are 
as  follows:  K  =  4  pulses;  delays  of  the  two  arrivals  equal 
to  n  =  0  and  T2  =  0.5TC;  signal  to  noise  ratio  (SNR)  of 
the  first  arrival:  14dB;  Signal  to  Interference  Ratio  (SIR) 
of  the  first  arrival:  -7dB;  the  second  signal  is  attenuated 
3dB  with  respect  the  first,  and  they  are  in  phase  at  the 
first  sensor.  The  temporal  spacing  of  the  FIR  channel  is 
assumed  to  be  To  =  0.5TC. 

In  figure  1,  the  finite-sample  and  asymptotic  perfor¬ 
mance  in  absence  of  model  errors  (i.e.,  To  =  ri  —  To  and 
the  length  of  the  FIR  filter  d  is  equal  to  the  number  of  ar¬ 
rivals)  of  the  different  estimator  is  illustrated.  We  consider 
that  the  number  of  taps  of  the  channel  is  d  =  2.  The  RM- 
SEs  of  the  exact  ML  estimator  and  the  proposed  approx¬ 
imation  reach  the  CRB  for  small  sample  sizes.  This  fact 
proves  that  neither  the  approximation  leading  to  (19)  nor 
the  subsequent  minimization  using  the  polynomial  rooting 
algorithm  entail  a  significant  degradation  with  respect  to 
the  exact  search-based  estimator.  Figure  2  bears  out  that 
the  methods  that  take  into  account  the  spatial  correlation 
of  the  interference  are  practically  insensitive  to  the  CCI 
level,  whenever  enough  degrees  of  freedom  are  available. 
On  the  other  hand,  under  the  rather  usual  of  assumption 
of  white-noise,  the  resulting  estimator  completely  fails  for 
SIR  <  — lOdB.  In  figure  3,  we  investigate  the  performance 
of  the  estimators  when  for  d  =  4  and  the  delay  difference 
between  the  signal  arrivals,  T2  —  ti,  does  not  necessarily 
coincide  with  the  spacing  of  the  FIR  channel,  To.  As  ex¬ 
pected,  the  RMSE  presents  minima  when  the  former  is  a 
multiple  of  the  latter.  In  the  other  cases,  the  model  in  (1)  is 
only  approximate,  which  results  in  a  higher  RMSE.  Finally, 
increasing  the  length  of  the  FIR  filter  beyond  the  necessary 
minimum  (d  =  2  in  this  case)  impairs  the  performance,  as 
shown  in  figure  4. 

6.  CONCLUSIONS 

The  problem  of  time  delay  estimation  in  a  multipath  chan¬ 
nel  has  been  considered.  The  channel  is  modeled  as  an  un¬ 
known  FIR  filter,  and  the  CCI  is  assumed  to  have  unknown 
spatial  correlation.  Starting  from  the  exact  ML  solution,  we 
have  derived  an  approximate  estimator,  which  has  allowed 
us  to  use  a  polynomial  rooting  approach  to  obtain  the  esti¬ 
mates.  The  proposed  method  attains  the  CRB  in  absence 
of  modeling  errors  and  is  robust  against  arbitrarily  high  in¬ 
terference  levels.  Finally,  the  effects  of  varying  the  number 
of  taps  of  the  channel  and  varying  the  delay  between  the 
arrivals  of  the  signal  have  been  investigated. 
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Figure  1:  RMSE  versus  the  number  of  training  pulses. 
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ABSTRACT 

An  iterative  algorithm  for  blind  channel  identification 
(no  training  symbols  necessary)  based  on  the  Super- 
Exponential-Algorithm  is  shown.  On  the  assumption 
of  independent,  identically  distributed  (i.i.d.)  data  the 
algorithm  has  fast  convergence  properties.  It  is  ro¬ 
bust  with  respect  to  system  overfit  (supernumerarily 
assumed  channel  coefficients  converge  to  zero)  and  in¬ 
fluence  of  modest  additive  white  Gaussian  noise  even  in 
mixed-phase  moving  average  channels.  Despite  of  the 
use  of  fourth  order  cumulants  the  complexity  of  the  al¬ 
gorithm  is  rather  low  compared  with  alternative  blind 
methodes.  So  the  implementation  on  a  signal  processor 
(TMS320C40)  was  possible  assuming  GSM-like  condi¬ 
tions. 

1.  INTRODUCTION 

In  recent  years  there  were  several  suggestions  for  blind 
equalization,  i.e.,  training  sequences  are  not  necessary 
for  adapting  the  equalizer  in  the  receiver.  Only  infor¬ 
mation  on  the  modulation  scheme  and  the  statistics 
of  the  transmitted  symbols  is  necessary.  To  obtain  fast 
convergence  despite  of  short  block  lengths  a  closed  form 
solution  based  on  fourth  order  cumulants  is  used.  The 
Eigenvector  Algorithm  (EVA)  [3]  as  well  as  the  Super- 
Exponential  Algorithm  (Supex)  [5]  belong  to  this  cat¬ 
egory.  Applying  a  FIR  structure,  these  algorithms  ap¬ 
proximate  the  MMSE  (Minimum  Mean  Square  Error) 
solution.  Severe  problems  occur  if  zeros  of  the  channel 
impulse  response  are  on  (or  close  to)  the  unit  circle  as 
it  is  likely  in  mobile  communication  environments. 

To  avoid  these  problems  a  Decision  Feedback  Equal¬ 
izer  [4],  or  a  system  of  a  channel  estimator  and  Viterbi 
equalizer  can  be  used.  Combining  the  impulse  response 
of  the  equalizer  with  the  channel  impulse  response  leads 


s 


Figure  1:  Time  discrete  system  model  of  Supex  Algo¬ 
rithm. 

to  an  algorithm  in  closed  form  for  channel  identifica¬ 
tion  based  on  the  results  of  blind  equalization.  Thus, 
for  example,  the  Eigenvector-Identification  Algorithm 
(EVI)  [1]  is  derived  from  EVA. 

In  the  next  section  we  describe  the  Supex  algorithm 
for  a  linear  equalizer.  Based  on  that  we  derive  in  sec¬ 
tion  3  an  algorithm  for  channel  identification  with  com¬ 
parable  qualities  as  the  EVI  Algorithm  but  much  lower 
complexity.  Simulation  results  are  given  in  section  4. 
A  conclusion  is  given  in  the  last  section. 


2.  SUPER-EXPONENTIAL  ALGORITHM 
(SUPEX)  FOR  A  LINEAR  EQUALIZER 

In  the  time  discrete  system  model  as  shown  in  Figure 
1  the  channel  is  represented  by  the  column  vector  h, 
whereas  the  column  vector  c  is  used  for  the  impulse 
response  of  the  equalizer.  The  over-all  system  impulse 
response  is  represented  by  the  column  vector  s.  The 
basic  idea  of  the  Supex  Algorithm  is  to  exponentiate 
iteratively  each  vector  element  of  of  the  combined  sys¬ 
tem  s,  while  holding  the  power  of  s  at  a  constant  level. 
After  a  few  iterations  only  the  dominant  amplitude  el¬ 
ement  remains,  while  all  other  elements  converge  to 
zero.  A  normalization  operation  after  each  iteration 
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step  leads  to  the  following  scheme: 


with  the  column  vector 


Sn  =  (sWnSn^r 

.0+1)  _  |a 

"Hi  II 


(1) 


The  expression  W  denotes  the  actual  iteration  step, 
*  refers  to  the  conjugate-transpose  operation,  and 
||  ...  ||  stands  for  the  Euclidean  norm.  The  index  n 
represents  the  n-th  element  of  the  vector  s.  The  aux¬ 
iliary  variable  sn  is  not  really  necessary  but  permits  a 
forseeable  notation.  As  it  is  shown  in  [5]  this  algorithm 
converges  to  the  desired  dirac  impulse  (perfect  equal¬ 
ization)  if  there  is  exactly  one  leading  tap  and  p+q  >  2. 
Unfortunately  we  don’t  know  s,  since  the  channel  h  is 
unknown  to  the  receiver.  That’s  why  the  algorithm  (1) 
has  to  be  expressed  in  terms  of  the  equalizer  coefficients 
c.  On  the  condition  of  transmitting  i.i.d.  samples  d 
and  by  choosing  p  =  2,  q  =  1,  the  channel  h  can  be 
expressed  by  using  fourth  order  cumulants  [5]  and  the 
autocorrelation  matrix  R,yJ/  of  the  received  samples  y: 


c  = 


'0+1)  = 


(2) 


The  vector  vzy  is  calculated  and  updated  after  each 
iteration  by  the  quotient 


v  _  cum 4{zk,zh,z£,y*} 
zy  cum  4{dk,dk,d*k,d*k} 

of  fourth  order  cumulants,  where 

cum4{a;i,a;2,a;3,a;4}  =  E{xix2x3x4} 

— E{xi,x2}E{x3,x4} 
-E{xi,x3}E{x2,xi} 
-E{x1,x4}E{x2,x3}. 

The  denominator  in  Eqn.  (3)  is  constant  because  it  de¬ 
pends  only  on  the  modulation  scheme  and  the  statistics 
of  the  transmitted  data  <4. 

The  autocorrelation  matrix  Rj,y  is  given  by 

Ryy  =  E{yy*},  (5) 


(3) 

(4) 


y  =  [yk,Vk-i,  ■  ■  •  ,yk-L]'  (6) 

of  the  last  L  received  samples  and  L  being  the  order 
(memory)  of  the  equalizer. 

The  expression  '  denotes  the  transpose  operation. 

ad  comprises  the  power  of  the  transmitted  data  and 
the  output  Zk  of  the  equalizer  is  given  at  time  t  =  k-T 
by  the  convolution  product  Zk  =  Vk  *  c*,  T  being  the 
symbol  time. 

Consecutive  iteration  steps  are  connected  by 
—  Vk  *  cW  because  \zy  depends  on  (compare 
equation  (3)). 

To  start  the  iterations  it  is  necessary  to  initialize 
the  equalizer  with 

c(0>  =  [0,  •  •  •  ,0, 1,0,  •  -  *  ,  0]'.  (7) 

If  there  is  some  knowledge  about  the  channel  h  (e.g. 
minimum,  non-minimum  phase)  it  is  possible  to  set 
c^0)  accordingly. 

3.  THE  ALGORITHM  FOR  CHANNEL 
IDENTIFICATION:  SUPEST 

We  already  mentioned  that  the  Supex  Algorithm  ap¬ 
proximates  the  optimal  MMSE  solution.  Hence,  the 
result  of  the  Supex  Algorithm  can  be  written  as 

!  , 

c  «  cMmse  =  Ryj/  •  r dy.  (8) 

The  vektor  r dy  is  the  crosscorrelation  vector  be¬ 
tween  the  transmitted  data  and  received  samples.  Of 
course  this  vector  is  unknown  but  assuming  a  linear 
modulation  vdy  can  be  expressed  by: 

Tdy  -  a\  •  hr.  (9) 

The  vector  hr  contains  the  conjugate  complex  co¬ 
efficients  of  the  channel  impulse  response  h  in  reverse 
order  (therefore  r).  Applying  Equations  (8)  and  (9)  we 
obtain: 


c  ~  Ryy  '  ad  '  hr  •  (10) 

Multiplying  Equation  (10)  with  Ryj/  from  the  left- 
hand  side  and  dividing  by  the  power  of  the  transmitted 
data  leads  to  the  estimated  channel  impulse  response 
hr: 

hr  =  — j  ’  Rj/y  ’  C.  (11) 

ad 

Using  Equation  (2)  and  reusing  the  scalar 


K  = 


Qd 

\/c  R^yC 


(12) 


674 


which  is  already  calculated  during  the  last  iteration  of 
(2)  we  obtain  the  Equation  of  the  Super-Exponential- 
Estimator  (SupEst): 


hr  -  K  ■  V zy 


(13) 


To  achieve  a  proper  adjustment  of  the  vector  vzy, 
several  iteration  steps  of  the  Supex  Algorithm  are  nec¬ 
essary  before  Equation  (13)  can  be  evaluated. 

It  is  remarkable,  assuming  a  channel  with  order 
Lh  + 1,  that  2Lh  + 1  coefficients  have  to  be  estimated  to 
make  sure  that  all  coefficients  of  a  mixed  phase  channel 
are  really  in  the  result  vector  hr. 


4.  RESULTS 

We  demonstrate  the  capabilities  of  the  SupEst  Algo¬ 
rithm  with  a  channel  of  five  taps  as  shown  in  Figure 
2.  There  are  two  channel  zeros  close  to  the  unit  circle. 
Furthermore  two  coefficients  of  the  impulse  response 
are  equal  in  amplitude  violating  the  demand  of  one 
leading  tap. 


Figure  2:  Absolute  value  of  the  amplitude  impulse  re¬ 
sponse  and  zero  plot  of  MA  channel. 

Afterwards,  we  present  the  results  of  a  measure¬ 
ment  of  a  time  variant  mobile  radio  fading  channel. 
The  number  of  samples  (130)  was  chosen  even  less  than 
the  block  length  of  a  GSM  frame  (148). 

4.1.  Simulation  with  Matlab 


Using  QPSK  modulation  and  coherent  demodulation 
the  simulation  leads  to  the  results  depicted  in  Figure 
3  (noise  free  case).  Although  there  are  only  five  taps 
according  to  the  channel  shown  in  Figure  2,  we  assumed 
a  channel  with  nine  coefficients.  In  other  words,  2  • 
8  +  1  coefficients  have  to  be  estimated.  Raising  the 
block  length  the  supernumerary  coefficients  converge 
to  zero.  The  other  coefficients  converge  to  the  original 


Figure  3:  Results  of  SupEst  Algorithm  with  increasing 
block  length  without  noise. 


taps  marked  by  arrows  on  the  right  margins.  With  the 
doted  lines  the  standard  deviation  is  marked. 

In  the  next  simulation,  the  influence  of  additive 
white  Gaussian  noise  was  investigated.  Using  the  same 
conditions  as  above  but  with  a  fixed  block  length  of 
3000  samples,  the  results  are  shown  in  Figure  4.  Up  to 
an  SNR  of  6  dB  the  algorithm  delivers  good  results. 

These  results  remain  valid  for  other  channels  and 
are  even  better  in  particular  if  there  is  only  one  leading 
tap. 


o'  ■  - 

2  4  6  8  10  12 


SNR  in  dB 

Figure  4:  Behaviour  of  SupEst  Algorithm  with  addi¬ 
tive  white  Gaussian  noise  and  a  block  length  of  3000 
samples. 
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4.2.  Measurement 
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Figure  5:  Absolute  value  of  the  estimated  channel  im¬ 
pulse  response  using  the  SupEst  Algorithm  for  identi¬ 
fication  of  a  mobile  radio  fading  channel  with  a  block 
length  of  130  samples. 


5.  CONCLUSION 

We  have  shown  that  the  new  algorithm  works  with  crit¬ 
ical  channels  even  if  there  are  two  taps  with  equal  am¬ 
plitudes.  The  SupEst  is  robust  with  respect  to  system 
overfit  and  influence  of  modest  white  Gaussian  noise. 

Although  the  algorithm  uses  fourth  order  cumu- 
lants,  the  computational  effort  is  relatively  low  com¬ 
pared  with  other  methodes  as  the  EVI  Algorithm,  for 
example.  Therefore,  we  were  able  to  implement  the 
SupEst  on  a  DSP  in  a  mobil  radio  fading  channel  en¬ 
vironment.  As  presented  in  Figure  5,  the  algorithm  is 
performing  well  under  conditions  known  from  the  GSM 
standard. 
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ABSTRACT 

The  presence  of  the  desired  signal  during  the  estimation  of 
the  minimum-variance-distortionless-response  (MVDR)  or 
auxiliary-vector  (AV)  filter  under  limited  data  records  leads 
to  significant  signal-to-interference-plus-noise  ratio  (SINR) 
performance  degradation.  We  quantify  this  observation  in 
the  context  of  DS/CDMA  communications  by  deriving  two 
new  close  approximations  for  the  probability  density  func¬ 
tions  (under  both  desired-signal- “present”  and  “absent”  con¬ 
ditions)  of  the  output  SINR  and  bit-error-rate  (BER)  of  the 
sample-matrix-inversion  (SMI)  MVDR  receiver.  To  avoid 
such  performance  degradation  we  propose  a  DS/CDMA  re¬ 
ceiver  that  utilizes  a  simple  pilot-assisted  algorithm  that 
estimates  and  then  subtracts  the  desired  signal  component 
from  the  received  signal  prior  to  filter  estimation.  Then,  to 
accomodate  decision  directed  operation  we  develop  two  re¬ 
cursive  algorithms  for  the  on-line  estimation  of  the  MVDR 
and  AV  filter  and  we  study  their  convergence  properties. 
Finally,  simulation  studies  illustrate  the  BER  performance 
of  the  overall  receiver  structure. 

1.  INTRODUCTION 

The  ideal  minimum  -  variance  -  distortionless  -  response 
(MVDR)  [1]  filter  evaluated  using  the  perfectly  known  co- 
variance  matrix  of  the  desired-signal-free  input  vector  can 
be  shown  to  be  equivalent  to  the  MVDR  filter  evaluated 
using  the  perfectly  known  signal-present  covariance  matrix. 
However,  their  estimated  filter  counterparts  are  not  equiv¬ 
alent  in  terms  of  statistical  performance  measures  of  in¬ 
terest.  In  this  paper  we  derive  a  close  approximation  of 
the  probability  density  function  (pdf)  of  the  output  signal- 
to-interference- plus- noise  ratio  (SINR)  and  the  bit-error- 
rate  (BER)  of  DS/CDMA  receivers  that  utilize  the  follow¬ 
ing  sample-matrix-inversion  (SMI)  MVDR  filter  estimates: 
The  first  estimate  is  calculated  using  desired-signal-free  re¬ 
ceived  vectors  while  the  second  estimate  is  calculated  using 
desired-signal-present  received  vectors.  The  newly  devel¬ 
oped  SINR  and  BER  approximate  pdf  expressions  prove 
and  quantify  the  need  for  filter  estimation  (training)  un¬ 
der  desired-signal-free  conditions  (this  need  was  also  ob- 
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served  long  ago  by  array  radar  practitioners).  In  particu¬ 
lar,  comparing  the  two  pdfs  we  will  reason  that  the  use  of 
the  desired-signal-present  covariance  matrix  estimate  can 
lead  to  significant  SINR  and  BER  performance  degradation 
when  the  estimate  is  based  on  a  limited  record  of  input 
observations.  To  avoid  the  requirement  for  silent  periods 
of  the  user  of  interest  that  requires  significant  coordina¬ 
tion  among  the  DS/CDMA  users,  we  propose  to  proceed 
by  estimating  and  then  subtracting  the  desired  transmis¬ 
sion  from  the  received  vectors  prior  to  the  sample-average 
estimation  of  the  covariance  matrix.  This  way  we  obtain 
an  estimate  of  the  interference-plus-noise  covariance  matrix 
which  is  then  used  to  evaluate  the  MVDR  or  the  auxiliary- 
vector  (AV)  [2],  [3]  filter  estimates  of  interest.  To  this  end 
we  propose  to  use  initially  a  simple  pilot-assisted  (super¬ 
vised)  algorithm  [4]  and  then  switch  to  decision-directed 
mode.  The  latter  operational  mode,  however,  requires  the 
use  of  on-line  recursive  algorithms  for  the  estimation  of  the 
AV  and  MVDR  filter.  Recursive  AV  filter  estimators  have 
not  been  reported  in  the  literature  so  far,  while  the  LMS 
and  RLS  recursions  qualify  as  candidates  for  the  recursive 
on-line  estimation  of  the  MVDR  filter.  In  this  paper  we 
develop  a  new  recursive  algorithm  for  the  on-line  estima¬ 
tion  of  the  AV  filter  and  a  modified  LMS-type  algorithm 
for  the  estimation  of  the  MVDR  filter.  These  algorithms 
represent  low-complexity  alternatives  to  their  batch  coun¬ 
terparts.  While  their  development  was  motivated  by  deci¬ 
sion  directed  operation  needs,  they  can  be  viewed  as  useful 
stand-alone  tools,  as  well.  Theoretical  results  included  in 
this  work  establish  formally  the  convergence  of  the  proposed 
recursive  algorithms.  Finally,  simulation  studies  illustrate 
the  performance  levels  achieved  by  the  overall  proposed  re¬ 
ceiver  structure  that  operates  under  limited  pilot  signaling 
followed  by  decision-directed  mode. 


2.  SYSTEM  MODEL  AND  BACKGROUND 

We  consider  K  DS/CDMA  users  that  transmit  over  a  multi- 
path  Rayleigh  fading  additive  white  Gaussian  noise  (AWGN) 
channel.  The  multipath  channel  is  modeled  as  a  tapped- 
delay  line  (TDL).  The  k- th  user  baseband  transmitted  sig¬ 
nal  is  given  by 

“*(0  ~  £  bk(i)V~EkSk(t  -  iT ),  k  =  0, . . . ,  K—  1,  (1) 


0-7803-5988-7/00/$  10.00  ©  2000  IEEE 


677 


where  bk(i)  €  {-1,4-1}  is  the  t-th  data  (information)  bit, 

T  is  the  information  bit  period  and  Ek  denotes  the  trans¬ 
mitted  energy.  The  normalized  signature  waveform  sk(t) 
is  given  by  sk(t)  =  ~  ITC)  where  dk(l)  € 

is  the  /-th  bit  of  the  spreading  sequence  of  the 
Jfc-th  user,  i>(t)  is  the  chip  waveform,  Tc  is  the  chip  period, 
and  L  =  T/Tc  is  the  system  spreading  gain. 

The  received  signal  is  collected  by  a  uniform  linear  an¬ 
tenna  array  consisting  of  M  elements,  spaced  half-the-wave- 
length  apart.  The  baseband  received  signal  at  the  m-th  an¬ 
tenna  element  (m  =  0, . . . ,  M  -  1)  is  given  by 

rm{t)j£Nj2cH,nuk(t-%- rk)e-^sine^+nm(t)  (2) 

k=0  n=0 

where  Np  is  the  number  of  resolvable  paths,  assumed  to  be 
the  same  for  all  users.  In  (2),  Ck,n  is  the  effective  complex 
Gaussian  channel  coefficient  which  is  assumed  to  be  iden¬ 
tical  accross  all  antenna  elements  (no  antenna  diversity  is 
considered)  and  9k,n  identifies  the  angle  of  arrival  all  with 
respect  to  the  n-th  path  of  the  fc-th  user.  rk  is  the  relative 
transmission  delay  of  user  k  with  respect  to  user  0  (r0  =  0) 
and  with  rm(t)  bandlimited  to  B  =  l/Tc  the  TDL  has  taps 
spaced  at  chip  interval  Tc.  In  (2),  nm(t)  represents  additive 
sensor  noise  modeled  as  temporally  and  spatially  complex 
white  Gaussian  (WG)  with  variance  <r2.  The  received  sig¬ 
nals  r0(i),...,rM-i(()  can  be  grouped  to  form  the  vector 

r(t)  =  [r0(t),ri(t),. . .  ,rM-i(*)]T 

=  53  _  ~  r<0a(e'‘)  4  n(t)  (3) 

k=0  n=0 

where  a(*fc)  =  [1,  e->™n  \  e»]T,  k  =  0, . . . , 

K  —  1,  is  the  steering  vector  associated  with  the  fc-th  user, 

and  n(i)  =  [no(f)>  •  •  ■ ,  «m-i(<)]T- 

Chip-matched  filtering  of  r(t)  and  sampling  at  the  chip 
rate,  1  /Tc,  over  the  symbol  time  interval  (L  +  Np  —  1  chip 
periods)  prepares  the  data  for  one-shot  detection  of  the  i- 
th  information  bit  of  interest  bo(i).  By  stacking  the  vector 
samples  r(0),  . . . ,  r {{L  +  Np  -  1  )TC)  one  below  the  other 
we  obtain  the  space-time  received  data  vector 

fMtL+Np-Dx^WofKTc)1,  ...rT((!4lVP-l)Tc)]T  (4) 
The  cornerstone  for  any  form  of  joint  space-time  filter¬ 
ing  is  the  space-time  signature  which,  for  user  0,  is  defined 

as  Vo  =  En=o  1  Son)  ®  a(0o,n)  where 

S(0n)=  [0, ....  0,  do(0), . . . ,  d0(L  -  1),  (5) 

n  Np  —n 

and  ®  denotes  the  Kronecker  product.  We  assume  (without 
loss  of  generality)  that  ||V0||  =  1. 

A  linear  joint  S-T  receiver  with  tap  weight  vector  w  € 
gM(L+ Np-i)  ,jetects  the  transmitted  bit  of  the  user  of  in¬ 
terest  as 

bo  =  sgn(Re{wH  r})  (6) 

where  sgn(-)  identifies  the  sign  operation,  and  ex¬ 

tracts  the  real  part  of  a  complex  number.  In  this  work 
we  consider  two  types  of  linear  receivers.  The  first  type 
is  the  minimum-variance-distortionless-response  (MVDR) 


linear  receiver  whose  tap-weight  vector  is  designed  to  min¬ 
imize  the  variance  at  the  filter  output  £{|wH  r  |2}  while 
maintaining  unity  response  in  the  vector  direction  V0.  The 
MVDR-receiver  tap  weight  vector  is  given  by 

R-_1Vo 

w  mvdr  =  vhr-TvV 

In  (7)  R  =  E{ifH}  is  the  covariance  matrix  of  the  received 
vector  r.  The  second  type  is  the  Auxiliary- Vector  (AV)  lin¬ 
ear  receiver  [2],  [3]  whose  tap  weight  vector  is  a  member  of 
a  sequence  of  vectors  that  converges  to  the  MVDR  solution. 
The  AV  filter  sequence  can  be  obtained  as  follows: 

Wav(o)  =  Vo  (8) 

for  p  —  1,  2,  3, . . . 

Gp  =  Rw^v(p-i)  -  Vf  Rwyiv(p_i)Vo  (9) 


G*Rwjv(p-i) 

,tp  “  G«RGp 

(10) 

P 

W AV(p)  =  W^vqo)  -  53  ViGi 
i—1 

(ii) 

The  auxiliary  vector  generation  procedure  may  stop  when 
Gp+i  =  0.  In  that  case  wAV(p)  is  exactly  equal  to  w mvdr. 

Formal  theoretical  analysis  of  the  sequence  of  auxiliary- 
vector  filters  w,4v(o)>  wav(i)>-  •  •>  was  pursued  in  [3]  where 
it  was  shown  that 

lim  v/av(p)  —  '"mvdr •  (12) 

p— *oo 

The  MVDR  and  AV-type  algorithms  outlined  above, 
require  knowledge  of  the  covariance  matrix  R  which  is  un¬ 
known  in  practice  and  it  is  usually  estimated  by  sample  av¬ 
eraging  over  a  finite  set  of  joint  S-T  data  r  j ,  j  —  0, . . . ,  N  — 
1.  The  resulting  estimator  R  is  given  by 

j=o 

Using  R  in  (7)  and  (8)-(ll)  we  obtain  the  MVDR  and  AV 
filter  estimates  w smi  and  w^vqp),  respectively,  where  the 
subscript  SMI  stands  for  “sample-matrix-inversion.”  As  il¬ 
lustrated  in  [3]  for  a  fixed  finite  data-record-size  N,  the 
sequence  {w^yqp^jp  provides  filter  estimators  with  vary¬ 
ing  bias  versus  covariance  characteristics  that  converge  to 
w smi-  For  short  data  records  N ,  the  early,  non-asymptotic, 
elements  of  the  generated  sequence  of  AV  estimators  of¬ 
fer  favorable  bias/covariance  balance  and  are  seen  to  out¬ 
perform  significantly  in  mean-square  estimation  error  the 
w smi  estimator. 


3.  SINR  AND  BER  PDFS  OF  THE  JOINT  S-T 
SMI  MVDR  RECEIVER 


Let  w  be  a  linear  S-T  receiver  that  is  distortionless  in  the 
Vo  direction  i.e.,  w^Vo  =  1.  Then  the  SINR  at  the  filter 
output  is  given  by 


5(w)  = 


Eo 

WHR/+nW’ 


(14) 


where  the  index  I  -\-n  is  used  to  distinguish  the  interference^ 
plus-noise  input  covariance  matrix  R/+n  =  R  —  FoVoV0 
from  the  desired-signal-present  input  covariance  matrix  R. 
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We  are  interested  in  obtaining  the  pdf  of  the  output 
SINR  of  the  MVDR  filter  estimators  w smi  =  VHft-Tv0 

R—  *  Vn 

and  w sm I ,I+n  =  usinS  -^-point  sample-average 

V0  RJ+rty° 

estimates  of  the  desired-signal-present  and  desired-signal- 


free  covariance  matrix,  respectively.  Evaluation  of  these 
pdfs  requires  knowledge  of  the  pdf  of  R  and  R/+n-  We 
make  the  following  simplifying  assumption:  We  assume  that 
the  received  vectors  are  identically  distributed  according  to 
a  multivariate  Gaussian  distribution  A/"(0,R).  Then,  the 
estimator  R  is  distributed  according  to  a  Wishart  distribu¬ 
tion  with  N  degrees  of  freedom,  Wm(l+jvp-i)(-^_1R-;  N) 


[5].  Similarly,  we  assume  that  R/+n  is  distributed  accord¬ 
ing  to  a  Wishart  distribution  with  N  degrees  of  freedom, 

Wm(l+np-i){N  1R/+n;lV'). 

The  following  theorem  provides  a  close  approximation 
of  the  pdf  of  the  output  SINR  of  the  estimated  SMI  MVDR 
filter  for  the  case  in  which  filter  estimation  is  performed 


in  the  presence  of  the  desired  signal  as  well  as  the  case  in 
which  filter  estimation  is  performed  in  the  absence  of  the 


desired  signal.  The  proof  is  omitted  due  to  lack  of  space. 


Proposition  1  (i)  The  pdf  of  Pe(wsMi)  can  be  approxi¬ 
mated  by 

fp.{  x)  =  2V2 7Q"1(x)/s([Q"1(*)]2)et9“1(l)l2/2  (18) 

(ii)  The  pdf  of  Pe(w sM/,J+n)  can  be  approximated  by 

fpc,l+n{x)  =  2V2wQ~1(x)fs,i+n([Q~1(x)]2)elQ  (;r)]  /2 

(19) 

where  fs(s)  and  fs,l+n(s)  are  given  by  (15)  and  (16),  re¬ 
spectively.  E 

In  Fig.  1  we  plot  the  pdf  of  Pe(wsMi,i+n)  and  Pe(wsMi) 
for  a  DS/CDMA  system  for  which  the  ideal  MVDR  perfor¬ 
mance  level  is  at  Pe  =  10~2.  The  data-record-size  N  is 
equal  to  200.  An  antenna  array  consisting  of  M  =  5  ele¬ 
ments  is  assumed.  The  system  processing  gain  is  L  =  15 
and  the  number  of  paths  is  Np  =  3.  Comparing  the  two 
pdfs,  we  see  that  the  BER  performance  of  w sMi,i+n  is  sig¬ 
nificantly  more  likely  to  lie  near  the  performance  of  the  ideal 
filter  (Pe{WMVDR,I+n)  =  Pei'ffMVDR )  =  10-2). 

4.  INTERFERENCE-PLUS-NOISE 
COVARIANCE  MATRIX  ESTIMATION 


Theorem  1  (i)  The  pdf  of  S(v/smi)  can  be  approximated 
by 

N(1+S0)S0[(N -M{L  +  Np-l)  +  2)S0+ 

}S[S)  2^[Ns{S0  -  a)(l  +  So)]3/2 

s{M(L  +  NP-  1  )(S„  +  2)  -  N  -  2S0  -  4)] 

l(M(i.+iV,  -i)-2)(i+t)^n  -JV(Sq  -«)13 

e  2N.(S0-')(.1+S0)  (15) 

(ii)  The  pdf  of  S(wsMl,l+n)  can  be  approximated  by 

,  ,  ,  NSg[(2M(L  Np  —  1)  —  N  —  4)s+ 

/5’J+nW_  2^(As(5o-s))3/2 

(N  -  M(L  +  Np  -  1)  +  2)<Sp] 


where  So  =  S(wmvdr)  =  S(wmvdr,i+<i)  is  the  output 
SINR  of  the  ideal  filters  v/mvdr  and  Wmvdr,/+ti,  N  is 
the  data  record  size,  M  is  the  number  of  antenna  elements, 
L  is  the  system  processing  gain,  and  Np  is  the  number  of 
resolvable  paths.  G 


In  the  following,  we  examine  how  the  results  of  Theo¬ 
rem  1  which  are  based  on  the  filter  output  SINR  translate 
into  BER  terms.  Under  the  assumption  that  the  received 
vector  is  Gaussian  distributed  the  BER  Pe( w)  at  the  out¬ 
put  of  a  sign  detector  that  follows  an  arbitrary  linear  filter 
w  (distortionless  in  the  Vo  direction)  can  be  expressed  as 


follows:  Pe(W)  ~  Q  (v/5(w))  (17) 

where  S(-)  is  defined  as  in  (14).  The  following  proposition 
provides  a  close  approximation  of  the  pdf  of  the  BER  of  the 
estimated  MVDR  receiver  for  the  desired-signal-present  and 


desired-signal-absent  case.  The  proof  is  omitted  due  to  lack 


of  space. 


The  theoretical  developments  of  the  previous  section  reveal 
the  advantages  of  evaluating  the  MVDR  filter  using  an  esti¬ 
mate  of  the  interference-plus-noise  covariance  matrix  R/+n 
in  place  of  the  estimated  covariance  matrix  R.  Although 
the  estimate  R/+„  can  be  obtained  easily  during  the  silent 
periods  of  the  user  of  interest,  such  an  approach  requires 
significant  amount  of  coordination  among  users.  Alterna¬ 
tively,  we  propose  to  estimate  the  interference-plus-noise 
component  by  subtracting  an  estimate  of  the  desired  trans¬ 
mission  from  the  received  samples. 

The  proposed  algorithm  is  based  on  the  observation  that 
E{b0{j)  fj}  =  i/EoVo  [4].  Thus,  given  a  known  trans¬ 
mitted  bit  sequence  (Mi)}.?  we  may  estimate  the  product 
VEoVo  by  jf  bo(i)  U  and  the  interference-plus-noise 

component  of  the  vectors  r}  by  rj  =r;  —bo(j)jj^2?=1bo(i)*i- 
The  interference-plus-noise  covariance  matrix  estimate  is 
then  given  by  R/+„  =  jf .  A  recursive  imple¬ 
mentation  of  the  R/+n  estimator  is  summarized  below: 


v0  =  0  (20) 

R.W„  =  61,  6>  0,  (21) 

for  j  =  1,2,... 

v>  =  j  (0  —  l)v>-i  +  bo(j)  r}),  (22) 

?>  =  r j  -b0(j)vj,  (23) 

r(4}„  =  ^  (u  -  +rjvf) ,  (24) 


The  following  theorem  deals  with  the  properties  of  the  es¬ 
timator  Rj+n  in  (24).  The  proof  is  omitted  due  to  lack  of 
space. 

Theorem  2  The  estimator  R/+n  in  (2f)  is  an  asymptot¬ 
ically  unbiased  estimator  of  R j+n.  Moreover,  for  fixed  j, 
r  is  a  biased  estimator  with  bias 

i  +  n 

E  {R(4„}  =  [y*(j  +  1)  +  7]  R/+n  (25) 
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where  4>{-)  is  the  Digamma  function  and  7  is  the  Euler  con¬ 
stant.  O 

At  the  end  of  the  training  period  the  receiver  reverts  to 
decision-directed  operation  where  the  information  bit  bo(j) 
in  (22)  and  (23)  is  substituted  by  the  estimate 

M j)  =  sgn  ^Re  fjj  j  .  (26) 

In  (26),  is  an  MVDR  or  AV-type  filter  estimate  eval¬ 

uated  using  the  covariance  matrix  estimate  R^+"^  of  the 
previous  j  —  1  step.  Implementation  of  the  decision  directed 
version  of  the  algorithm  in  (20)-  (24)  is  straightforward  but 
computationally  inefficient:  At  each  step  of  the  algorithm 
new  filter  estimates  have  to  be  formed  based  on  the  updated 
estimate  of  the  covariance  matrix  R Wn.  In  the  next  section 
we  derive  filter  update  rules  that  are  based  on  the  theory 
of  stochastic  approximation  and  provide  simple,  computa¬ 
tionally  efficient  on-line  recursions  for  the  evaluation  of  the 
AV  and  MVDR  filter. 


5.  RECURSIVE  ON-LINE  ESTIMATION  OF 
THE  AV  AND  MVDR  FILTER 

The  single- AV  case  w^v(i)  =  V0  —  /iiGi  can  be  treated 
as  follows.  We  note  that  the  auxiliary  vector  Gi  can  be 
expressed  in  the  form: 

Gr  =  (I  — VoV0H)R.Vo.  (27) 

Equivalently,  we  may  find  Gj  as  the  unique  solution  of  the 
equation  Z[(T)  —  0  where 

Z[(T)  =  T  -  (I  -  V0V^)R/+nVo.  (28) 

On  the  other  hand,  the  steering  scalar  pi  minimizes  the 
mean  square  error  between  V"r  and  G^r  and  is  given  by 

(cf'(10))  GHIf  V 

^  =  Gf  Rj+nG!  •  (29) 

Thus,  in  is  the  unique  solution  of  the  equation  Z"(v)  —  0 

Wlt  Z?(»)  =  v(G? Ri-fnGi)  -  G?  R/+„V0 .  (30) 

If  we  define  the  functions  £i(r;ry)  and  ("(n;  r,)  as 

d(F;?>)  =  r  -  j±j( I  -  V0Vf  )r,-rf  Vo  (31) 

<i(i'W)*i'j±j(GfrjrfG1)  -  jj-j G1"ryr"v0(32) 

then  E{(i( r;fj)}  =  Z{(T)  while  £{£>;?;)}  = 

The  following  theorem  describes  recursive  procedures  for 
the  evaluation  of  Gi  and  fii  and  establishes  their  conver¬ 
gence  w.p.  1  to  the  desired  values.  The  proof  is  omitted 
due  to  lack  of  space. 

Theorem  3  Let  { <*  y  }  y  be  a  sequence  of  positive  numbers 
such  that  J2'j=i  aj  =  00  and  Ey"=Ti  aj  <  °°-  The  recursions 
qU)  =  gO-D  +  a.(>  (g(/-D  ;  rj)  (33) 

and 

=  ?C/-1)  +  «yCi"(?iJ-1);Fj)  (34) 


In  practice  we  evaluate  Gi,  and  pi  by  coupling  the  two 
recursions  of  theorem  3  as  follows.  From  (28)  and  (30) 
we  see  that  the  pair  (Gi,/<i)  is  the  unique  solution  of  the 
equation  Zi(F,  v)  =  0  where  the  vector  function  Zi(T,  v) 
is  defined  as 

Zi(rv)*\  r-(I-VoV0-)R/+nV0 
Zl(I>)“  [  Kr"Rr+nr)-r*R;+nv0  (35) 

If  we  define  the  vector  function  £i(G,i/;  Fy)  as 


then  we  have 

E{Ci(r,i/;ry)}  =  21(r,I/).  (37) 

Thus  we  may  use  the  recursion 

[Si°T.iii«ir=[er,>r.5i'-T-«i(.(ef l->,  ?<>-*>;?,) 

(38) 

to  evaluate  Gi  and  pi . 

The  extension  to  the  multiple-AV  case  is  straightfor¬ 
ward.  The  m-th  auxiliary  vector  Gm  and  the  m-th  steering 
scalar  pm  are  the  unique  solutions  with  respect  to  Tm  and 
1 'm,  respectively,  of  the  equations 

m— 1 

rm  -  (I  -  V0V0H)RJ+n(Vo  -  ftjGj)  =  0  (39) 

3=1 

and 

m— 1 

>'m(GmR;+„Gm)  —  G^R/+n(Vo  —  ^  /r  y  G  y )  =  0.  (40) 

3  =  1 

Thus,  if  we  define  the  functions  Zm(Tm,  as 

Zm(Fm,  Um,  .  .  .  ,  Tl ,  iq)  = 

rm  -  (i  -  VoV0«)RJ+ri(Vo  -  E™-1  t'jTj)  1 
i/m(r"R/+nrm)  -  r"R/+n(v0  -  ZTJi1  »&)  \ 1  ' 

then  the  M  pairs  (Gi,  pi ),...,  (Gm,  ft m)  are  the  unique 
solution  of  the  system  of  equations 

Zi(Ti,vi)  =  0  (42) 


Zm{Fm,  vm,  ■  ■  •  ,Fi,  vi)  =  0.  (43) 

Let  Cm(rm,  i8m, . . . ,  Ti,  v\  ;  Fy)  be  defined  as 
Cm(rm,  Vmi  •  •  •  >  Tj,  Vi  ;  ry)  = 

rm  -  f—i  ( I  -  VoV0")FJFf(Vo  -  ZTJi1  ^Tj) 
-?Ti'm(r"FJF"rm)  -  j£ji'm?i?jf (v0  -  E TJi1  y)  • 


Since  {Cm  (I'm  ,  t'mj  •  •  •  ^  1 1  ^1  »  ^)}  —  •  •  >Fl  j  ^l) 

we  use  the  following  recursions 


1\p[}  1} ;  ?j) 


converge  w.p.  1  to  Gi  and  pi,  respectively,  where  ^ (T ;  r 7 ) 
and  £"(r>;ry)  are  defined  by  (SI)  and  (32).  □ 


«>Cm(g(>-1),  ^-1>;r>) 

(47) 
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to  evaluate  the  pairs  (Gi,  pi), . . . ,  (Gm,  Pm)-  The  initial 
values  (G(10),  p[0)), ...,  (G$,  are  the  corresponding  val¬ 
ues  obtained  at  the  end  of  the  training  period. 

In  Fig.  2a  and  2b  we  examine  the  convergence  of  the 
proposed  algorithm  for  the  evaluation  of  the  AV  filter.  A 
DS/CDMA  system  with  K  =  4  users  and  processing  gain 
L  =  15  is  assumed.  The  number  of  resolvable  paths  is 
Np  =  3  while  the  receiver’s  antenna  array  consists  of  M  =  5 
elements.  The  users  SNRs  are  fixed  at  10,  13,  14,  15dB.  The 
two  (2)  auxiliary  vector  filter  case  is  considered.  In  Fig.  2a 
we  plot  the  normalized  cross-correlation  between  GjJ\  G^ 
and  their  ideal  counterparts  Gi,  G2,  respectively,  as  a  func¬ 
tion  of  the  iteration  j.  In  Fig  2b  we  plot  the  mean  absolute 
error  between  \  and  their  ideal  counterparts  pi,  P2, 
respectively,  as  a  function  of  the  iteration  j.  The  results 
presented  are  averages  over  500  independent  experiments. 
The  angles  of  arrival  and  delays  of  all  the  users  are  chosen 
randomly  and  kept  constant  for  5  experiments. 

Following  a  similar  reasoning  we  can  derive  a  recursive 
stochastic  algorithm  for  the  evaluation  of  the  MVDR  filter. 
We  recall  that  the  MVDR  filter  evaluated  in  the  absence 

Rr+nv° 

of  the  desired  signal  is  given  by  v/MVDR,i+n  =  .  „  T-i 

V0 

and  is  equivalent  in  the  SINR  and  BER  sense  to  the  filter 
•w'MVDR,i+n  =  R-F+nVo-  The  latter  filter  is  the  unique 
solution  of  the  equation  R-i+n'wlMVDR,i+n  —  Vo  =0.  If  we 
define  the  function  £(w;?j)  as 

C(w;r,)=  -(rHw)r  — Vo  (48) 

J  —  l 

then  £{<(w;r,)}  =  Rr+„w  -  V0. 

Thus,  to  evaluate  w MVDR,i+n  if  suffices  to  find  the 
filter  that  makes  the  expected  value  of  ((w;  ?/)  equal  to 
zero.  The  following  proposition  describes  a  recursive  on¬ 
line  stochastic  procedure  for  the  evaluation  of  v'MVDR,i+n 
and  proves  its  convergence.  The  proof  is  omitted  due  to 
lack  of  space. 

Proposition  2  The  recursion 

^MVDR.I+n  =  wMVDfi,/+Ti  _  “jCCwAfi/nH, J+n’ r>)  (49) 

converges  w.p.  1  to-w'MVDRiI+n  =  Rj+„Vo,  where  ((vr,rj) 
is  defined  by  (48)  and  {a j}j  is  a  sequence  of  positive  num¬ 
bers  such  that  aJ  =  00  i  aj  <  °°-  D 

The  initial  values  of  Vo  and  Wj wVDR,l+n  are  correspond¬ 
ing  values  obtained  at  the  end  of  the  training  period. 

In  Fig.  3  we  compare  the  BER  performance  of  the 
MVDR  and  AV  filter  estimators  evaluated  using  R/+n  with 
the  performance  of  their  R-based  counterparts  (denoted  as 
traditional).  The  system  setup  is  the  same  as  in  Fig.  2. 
The  AV  algorithm  is  switched  to  decision  directed  mode 
after  50  training  samples  while  the  MVDR  algorithm  after 
300  training  samples.  The  performance  of  the  ideal  MVDR 
receiver  is  included  as  a  reference  point.  It  is  evident  that 
the  use  of  R/+w  in  evaluating  the  filter  estimators  leads  to 
significant  BER  performance  improvements. 
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Fig.  1:  Density  functions  of  Pe('WSMi,l+n)  and  Pe(wsMl) 
for  N  =  200. 


Iteration  (J) 

Fig.  2:  Convergence  of  the  recursive  AV  algorithm. 


No.  of  Samples 

Fig.  3:  Bit-error-rate  as  a  function  of  the  data  record  size 
N. 
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ABSTRACT 

Direct-sequence  spread- spectrum  (DS/SS)  techniques  are  wi¬ 
dely  used  in  military  and  commercial  communication  sys¬ 
tems  as  well  as  in  the  global  positioning  system  (GPS).  In 
this  paper,  we  consider  the  Doppler  effect  of  rotating  blades 
in  a  helicopter  on  the  DS/SS  communications.  We  ana¬ 
lyze  the  performance  of  DS/SS  communications  under  such 
multipath  environment,  by  taking  into  consideration  both 
Doppler  fading  and  time  delays.  The  model  describing  the 
effects  of  rotating  blades  on  the  desired  signal  is  established, 
and  the  system  performance  is  analyzed.  It  is  shown  that, 
for  this  specific  application,  the  communication  channel  is 
not  Rayleigh  and  does  not  resemble  any  of  the  commonly 
assumed  fading  models  in  wireless  communications. 

1.  INTRODUCTION 

Direct-sequence  spread-spectrum  (DS/SS)  techniques  are 
widely  used  in  military  and  commercial  communication  sys¬ 
tems,  as  well  as  in  the  global  positioning  system  (GPS) 
[1,  2,  3].  In  this  paper,  we  focus  on  airborne  antennas  for 
satellite  communications  mounted  on  a  helicopter.  The  per¬ 
formance  of  these  systems  are  affected  by  multipaths  with 
severe  Doppler  fading.  Although  the  communication  sys¬ 
tems  may  also  suffer  from  the  Doppler  effects  caused  by  the 
motion  of  the  helicopter  itself,  significant  signal  distortion 
is  induced  due  to  the  rotating  blades. 

The  signal  received  at  the  airborne  antenna  is  usually  a 
combination  of  a  direct  path  and  local  scatterers  formed  as  a 
result  of  the  signal  reflections  from  the  rotating  blades.  The 
periodic  and  time-varying  rotational  motion  of  the  blades 
makes  the  spatial  signature  of  the  received  scatters  time- 
varying.  Rotating  blades  contribute  continuous  positive 
and  negative  frequency  shifts. 

The  effect  of  rotating  blades  on  the  echo  spectrum  for 
purpose  of  radar  target  detection  was  investigated  in  [4,  5]. 
In  [4],  it  is  pointed  out  that  the  signal  scattered  at  the  ro¬ 
tating  blades  yields  both  amplitude  modulation  (AM)  and 
frequency  modulation  (FM).  In  [5],  the  radar  cross-section 
spectra  of  rotating  multiple  blades  is  investigated.  All  these 


contributions  have  considered  the  problem  from  the  point 
of  view  of  radar  detection  of  the  scattered  echo  from  rotat¬ 
ing  blades.  However,  the  rotor  blade  motion  impairments  of 
the  signal  waveforms  for  wireless  communications  has  not 
been  addressed  or  investigated. 

In  this  paper,  we  analyze  the  performance  of  DS/SS 
communications  under  sever  multipath  environment,  by  tak¬ 
ing  into  account  the  effect  of  the  Doppler  fading  caused  by 
the  rotating  blades.  Some  typical  parameters  are  used  to 
illustrate  the  effect  of  time  delay  and  the  Doppler  frequency 
shift. 

2.  SIGNAL  MODEL 

For  a  single  user  case,  the  noise-free  signal  at  the  radio 
frequency  (RF)  is  expressed  as 

x(t)  =  y{t)e3u>ct,  (1) 

where  u>c  is  the  carrier  radian  frequency,  and  y(t)  is  the 

baseband  version  of  the  transmitted  signal,  which  is  mod¬ 
eled  as 

y(t)  =  p(t)d(t),  (2) 

where  p(t)  and  d(t)  are  the  spreading  waveform  and  the 
data-modulated  signal  waveform,  respectively,  expressed  as 

co  lc  —  1 

P(*)  =  H  J2  c(rz;  /)A(—  -l-nLc)  (3) 

n=— o o  /=!  C 

and 

d(t)=  b(nV(j;-n).  (4) 

71—  —  OO  '  ' 

In  the  above  equations,  b(n)  is  the  information  symbol, 
c(n;l)  €  {+1,-1}  is  the  aperiodic  spreading  code  at  the 
nth  symbol  and  the  1th  chip,  Lc  is  the  number  of  chips 
per  symbol,  T  and  Tc  are  the  symbol  and  chip  durations, 
respectively,  and 

1  0  <  t  <  1 

0  elsewhere. 
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3.  SCATTERING  EFFECTS  OF  ROTATING 
BLADES 


3.1.  Scattering  Model 

The  signal  arriving  at  a  point  on  a  blade,  illustrated  in 
Fig.  1,  is  expressed  in  a  general  form  as 

xm{t)  =  x(t  -  r(am (t)))A(am (t))g(a,m (f))  (5) 

where  am  (t)  is  a  vector  denoting  the  position  of  interest  at 
the  mth  blade,  m  =  0, 1,  •  •  • ,  M  —  1,  r(am(f))  is  the  time 
delay,  A(am(t))  is  a  real  scalar  representing  the  scattering 
loss,  and  g(am(t))  is  the  phase  term  caused  by  the  Doppler 
effect.  Typically,  the  number  of  blades  M  is  3  to  5.  For 
notation  simplicity,  we  abbreviate  r(am(t))  as  rm(f),  and 
S(am(t))  as  gm(t). 

The  time  delay  rm(t)  is  given  by  A/(am (t))/c  where  c 
is  the  speed  of  light  and  AZ(am(t))  =  Li  +  L2  —  To,  which 
represents  the  additional  distance  traveled  by  the  multipath 
relative  to  the  direct  path  (we  use  L\  and  L2  for  notation 
simplifity  although  both  parameters  are  a  function  of  vector 
am(i)).  The  Doppler  term  in  model  (5)  is  expressed  as 

gm(t)  =  exp  [  -  j~~r(v(a.m(t))  ■  fi(am(f)) 

_  x,  (6) 

+«(am(<)H2(am(t))j], 

where  A  is  the  wavelength  at  the  radio  frequency,  and  r  is 
the  distance  between  the  reflection  point  and  the  center  of 
the  blade,  as  shown  in  Fig.  1.  This  figure  illustrates  some 
key  parameters  in  the  underlying  problem.  In  equation  (6), 
u(am(t))  is  the  unit-norm  vector  describing  the  direction 
of  the  blade  movement.  fi(am(t))  and  h(a.m(t))  are  the 
unit-norm  vectors  along  the  line  connecting  the  point  of 
scattering  and  the  source,  and  that  connecting  the  point  of 
scattering  and  the  receiving  antenna,  respectively.  Further, 
denotes  the  inner  product  of  two  vectors.  In  the  sce¬ 
nario  considered,  the  source  is  located  in  the  far  field  of  the 
blades,  and 

#(am(<))  •  j*i(am(t))  «  cos(q)  sin  (urt  +  .  (7) 

where  cor  is  the  rotation  radian  frequency,  and  a  is  the  an¬ 
gle  between  the  line,  connecting  the  source  and  the  center 
of  the  rotor,  and  the  plane  of  the  rotor.  Both  u>r  and  a  are 
considered  constant  over  the  observation  period. 


When  the  receiving  antenna  is  positioned  close  to  the 
center  of  the  rotor,  u(am(t))  and  /2(am(t))  become  nearly 
orthogonal.  In  this  case,  v(a m(t))  ■  h(&m(t))  is  negligible, 
and  the  Doppler  effect  in  equation  (6)  can  be  simplified  to 


f  2tt  ( 

gm(t)  «  exp  -j—  rcos(a)sin  (  wrt  + 

=  exp\j(f>m(t)]. 


2mn\ 
~M  ) 


(8) 


To  further  simplify  the  analysis,  we  make  the  following 
assumptions  regarding  the  rotor  blades. 


Fig.  1  Parameters  of  the  blades. 


Al)  Each  blade  acts  as  a  homogeneous,  linear,  rigid  an¬ 
tenna. 

A2)  Each  blade  is  always  visible  to  both  the  source  and 
the  airborne  antenna,  i.e.,  there  is  no  shielding  of  the 
blades. 

A3)  The  near  field  effect  and  the  secondary  scattering  ef¬ 
fect  are  not  considered. 


The  overall  received  signal  is  an  integral  of  equation  (5) 
over  the  extent  of  the  blade  and  is  given  by  (the  scattering 
loss  density  A  is  used  instead  of  A(t,  r)  because  of  assump¬ 
tion  Al) 


xr  (t) 


xm{t)dr 


=  x(t)  + 


x{t  —  Tm{t))Aei4,rn{t)dr, 


(9) 


where  Ri  and  R2  are,  respectively,  the  distance  of  the  blade 
roots  and  that  of  the  blade  tips,  from  the  center  of  the 
rotation.  The  first  term  at  the  right  hand  of  equation  (9) 
stands  for  the  contribution  of  the  direct  path,  whereas  the 
second  term  is  the  contribution  of  the  scatters. 

The  baseband  signal  associated  with  equation  (9),  tak- 
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ing  into  account  the  effect  of  noise  term,  n(t),  is  given  by 


yr(t)  =  y(t ) 

M-1  f.R2 

+  Y1  v(t  -  Tm(t))Aej[Mt)-^TmW]dr 

m= 0  R1 

+n(t) 

=  p(t)d(t) 

A/  1  /*i?2 

+  y  l  p(t  -  rm(t))d(t  -  rm(t))Aei't’m(t)dr 

m—0 

+n(t), 

(10) 

where 


n(f)  kJcTm(t) 

=  —  27rAi(am(f))/A  (11) 

=  <j>m(t)  —  2tt(Li  +  L2  —  Lo)/X 


is  the  combined  phase  term  caused  by  the  Doppler  effect 
and  the  propagation  delay. 

Despreading  the  signal  over  the  nth  symbol  yields 

l  r(n+l)T 

Zr  (n)  =  -  /  yr(t)p(t)dt 

1  JnT 


1  MZ*  f(n+l)T  CR2 

+?£/  /  K<- 

i  m=0  d R\ 


rm{t))p{t) 


xd(t  —  rm{t))Ae^mW  drdt 
f(n+l)T 


i  r 

r  Jnl 


=  b{n) 


n(t)p(t)dt 


1  _ 1  /’(n+l)T 

+?  E  /  /  *< 

J  m—0  JnT  JRi 


(12) 


■  Tm(t))p(t) 


xd(t  —  Tm(t))Ae^m<'t^drdt 
+nz(t), 


where  nz(t)  is  the  noise  component  after  despreading.  The 
first  term  at  the  right  hand  of  equation  (12)  is  the  received 
signal  from  the  direct  path,  whereas  the  second  term  is  the 
contribution  of  the  scattered  signals,  upon  despreading. 

3.2.  Discussions  on  the  Scattering  Components 

The  significance  of  the  effect  of  rotating  blades  highly  de¬ 
pends  on  various  parameters,  such  as  the  dimension  of  the 
blades,  the  position  of  the  antenna,  the  RF  frequency,  the 
angle  a,  and  the  symbol  and  chip  rates.  Below  we  use  some 
typical  parameters  [2,  6],  given  in  Table  1,  to  illustrate  in¬ 
teresting  model  properties.  Different  values  of  a  are  con¬ 
sidered,  and  their  effects  on  the  Doppler  shift  and  the  bit 


error  rate  (BER)  are  demonstrated.  The  role  of  time  delay, 
Doppler  shift,  and  the  equivalent  channel  characteristics  is 
examined.  It  is  important  to  note  that  the  arguments  pre¬ 
sented  below  are  based  on  the  specific  values  listed  in  Table 
1,  and  may  change  with  signal  coding/modulation  and  ro- 
torcraft  structure  and  dimension. 

Parameter  Variation 

When  considering  a  symbol  period,  the  instantaneous  Dop¬ 
pler  frequency  shift  for  a  point  can  be  assumed  unchanged, 
since  u)rT  is  often  very  small.  For  example,  when  the  sym¬ 
bol  rate  is  100  kbauds,  ojrT  =  87tx10-5  =  2.51xl0-4(rad)  = 
0.0144°. 

At  the  blade  tips,  the  distance  traveled  over  this  period 
is  ll)tTR2  =  2.51  x  10-4  x  7.5  «  1.8  x  10~3  (m),  which  is 
relatively  small  compared  with  the  wavelength  (0.03  m  at 
10  GHz  RF).  Therefore,  the  position  of  the  blades  can  be 
considered  unchanged  over  a  symbol  period.  However,  this 
small  difference  in  position  may  result  in  propagation  phase 
change. 


Table  1:  Typical  Parameters  Considered 


Parameter 

Notation 

Typical 

value 

Radio  frequency 

u]c/2n 

10 

GHz 

Chip  rate 

1  /Tc 

10 

Mcps 

Symbol  rate 

1/T 

100 

kbauds 

Diameter  of  blades 

2  R2 

15 

m 

Rotation  speed 

u)r/2n 

4 

r/s 

Time  Delay  Consideration 

We  assume  that  the  antenna  is  located  dose  to  the  cen¬ 
ter  of  the  rotor.  In  this  case,  the  maximum  possible  delay 
is  A Imax  =  (1  +  |cos(q)|)  f?2-  Consider  a  typical  scenario 
where  R2  is  7.5  meters.  Then,  the  corresponding  maximum 
possible  time  delay  Tmax  =  &lmax/c  is  33.6  ns  in  the  case 
when  a  =  70°. 

If  the  chip  rate  is  10  Mcps,  the  chip  period  is  100  ns, 
which  is  about  three  times  the  maximum  possible  time  de¬ 
lay.  Therefore,  the  time  delay  cannot  be  totally  ignored, 
but  its  effect  may  not  be  significant. 

The  relative  time  delay  with  respect  to  the  chip  period 
becomes  larger  as  the  chip  rate  increases.  However,  as  it  is 
clear  from  equation  (12),  in  the  case  when  the  maximum 
time  delay  is  larger  than  the  chip  period,  the  multipath 
components  whose  delays  exceed  the  chip  period  will  be 
discriminated  at  the  receiver  by  the  virtue  of  despreading. 
Therefore,  the  maximum  time  delay  to  be  considered  is  the 
chip  period. 

Doppler  Effect  Consideration 

One  of  the  important  parameters  in  the  underlying  rotor 
scattering  problem  is  the  maximum  Doppler  frequency.  The 
instantaneous  Dopier  frequency  shift  of  the  scattered  signals 
is  the  derivative  of  the  phase  defined  in  equation  (8),  and 
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is  given  by 


A  fm(t) 


d(j)m  (t) 
2-irdt 


ru>T 


cos(q)  COS  UJrt  + 


27717T  A 

~M~ )  ' 


(13) 


Accordingly,  the  maximum  instantaneous  frequency  shift  of 
the  scattered  signals  is 


A  fm,max  (a)  =  max  A  fm{t)  = 

t,m,r 


RiljJr 

A 


cos  (a). 


(14) 


For  example,  consider  the  case  in  which  the  blades  rotate 
at  240  rpm  or  4  r/s,  oJr  =  87 r.  At  the  RF  frequency  10  GHz, 
A  =  0.03  m,  and  the  upper  bound  of  the  Doppler  frequency 
is  obtained  at  a  =  0  and  equal  to  Afm,max{0)  =  7.5  x 
87r/0.03  =  6.28  kHz.  Due  to  the  cos(a)  term,  the  maximum 
Doppler  frequency  shift  A fm,max(a)  is  often  smaller  than 
the  above  bound. 

Although  the  maximum  Doppler  frequency  is  much  smal¬ 
ler  than  the  chip  rate,  it  is  at  a  comparable  order  to  the 
symbol  rate.  It  is  clear  from  equation  (12)  that,  since  an 
integration  is  performed  over  a  symbol  period,  the  contribu¬ 
tion  of  the  scattering  multipaths  with  Doppler  frequencies 
higher  than  the  symbol  rate  is  small. 

Fig.  2  shows  the  instantaneous  Doppler  frequency  shift 
A  fm(t)  at  the  blade  tips  (r  =  Ri  =  7.5  m)  for  the  0th  blade 
(m=0)  versus  time  in  terms  of  the  number  of  symbols.  The 
results  are  shown  for  a  period  of  one  rotation  cycle  (0.25 
sec  =  25.000T),  where  the  angle  a  takes  the  values  0°,  45°, 
70°,  and  90°.  The  Doppler  frequency  shift  is  propotional 
to  wc,  ur,  and  r.  As  such,  any  change  in  these  parameters 
leads  to  a  linear  change  of  the  Doppler  frequency  shift. 


Fig.  2  Doppler  frequency  shift  vs.  time 
(m=0,  r=7.5m). 


Channel  Characteristics 

To  examine  the  effects  of  the  scattering  in  equation  (12), 
we  note  the  fact  that  4>m{t)  can  be  considered  constant  over 


a  symbol  period.  Define 


1  /■("+!  )T 

p(Tm{t))  =  Tjt  /  Pit-  Tm(t))p{t)dt 

1  JnT 


(15) 


as  the  correlation  function  of  the  chip  waveform.  Then,  the 
scattering  component  is  approximated  by 
■%  M—l  1)!T  /•  JI2 

48)(«)  /  P(t~rm(t))p(t) 

1  JnT  JRi 


xd{t  —  Tm[t))Ae^rn('t^  drdt 


M-l  „r2 

1  A b{n)  /  P(Tm(t))t 


(16) 


j$m{t)dr 


-  Z(t)Hn), 


where 

M-i  . r2 

m  =  Aj2  /  p(rm(t))ejM)dr  (17) 

m—O  *'R1 

is  the  channel  response  of  the  scattering  component  contri¬ 
bution. 

Examples  of  £(f)  (real  part)  are  shown  in  Fig.  3  for 
different  values  of  a,  where  we  have  assumed  that  the  an¬ 
tenna  is  located  close  to  the  center  of  the  rotors,  M  =  4, 
p(Tm(t))  =  1  -  |rm(t)|/Tc,  and  J?i=l  m.  The  results  are 
shown  for  6250  symbols,  or  equivalently  over  one  full  period 
of  £(<),  given  by  l/(M/r).  We  set  A  =  1/A  for  normaliza¬ 
tion.  It  is  evident  that,  when  a  is  small,  (,(t)  demonstrates 
large  peak  values.  On  the  other  hand,  when  a  is  large  (close 
to  90°),  the  result  becomes  very  small.  The  reason  is  dis¬ 
cussed  below. 

Naturally,  when  a  is  close  to  90°,  the  effect  of  Doppler 
frequency  shift  is  small.  Since  Lo  ~  Li,  then  L2  becomes 
the  dominant  parameter  in  defining  the  time  delay,  and 
subsequently  the  propagation  phase.  This  same  property 
causes  the  phase  to  be  periodic  over  r.  Because  of  the  peri¬ 
odicity,  averaging  over  r  will  then  lead  to  small  values.  On 
the  other  hand,  when  a  is  small,  the  two  contributions  to 
the  phase  by  the  Doppler  frequency  shift  and  the  propaga¬ 
tion  delay  from  Lo  -  L\  become  comparable.  Large  peaks 
appear  when  these  components  are  resonant  with  respect 
to  r. 


4.  BER  PERFORMANCE 

Computer  simulations  are  performed  to  illustrate  the  scat¬ 
tering  effect  on  the  BER  performance.  Both  the  symbol 
and  chip  modulations  are  assumed  to  be  binary  phase  shift 
keying  (BPSK).  The  typical  parameters  listed  in  Table  1  are 
used.  The  noise  is  assumed  to  be  a  white  Gaussian  random 
process. 

Fig.  4  shows  the  BER  performance  versus  the  input 
signal-to-noise  ratio  (SNR)  for  different  values  of  A  and 
a.  When  A  is  large  (A  >  10_2/A  in  this  figure)  and  a  is 
small  (a  <  45°  in  this  figure),  the  BER  takes  a  high  value 
which  slowly  decrease  with  increased  input  SNR.  This  is  the 
impact  of  large  scattering  from  the  rotating  blades.  On  the 
other  hand,  when  A  is  small  {A  =  10_3/A  in  this  figure)  or 
a  is  close  to  90°  (a  =  70°  in  this  figure),  the  effect  of  the 
rotating  blades  is  negligible. 
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Fig.  3  Scattering  component  £(t)  vs.  time. 


5.  CONCLUSION 

We  have  analyzed  the  channel  characteristics  and  the  per¬ 
formance  of  direct-sequence  spread-spectrum  (DS/SS)  com¬ 
munications  under  the  multipath  environment  caused  by 
the  rotating  blades  in  a  helicopter,  by  taking  both  effect  of 
the  Doppler  fading  and  the  time  delays. 
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ABSTRACT 

Design  accurate  estimators  which  also  consider  the  noise 
term  in  low  SNR  scenarios  is  paramount  to  achieve  optimal 
solutions  and  to  obtain  precise  symbol  detectors.  Partic¬ 
ularly,  this  paper  estimates  the  propagation  delays  focus¬ 
ing  on  asynchronous  DS-CDMA  systems.  The  proposed 
Minimum  Conditioned  Variance  (MCV)  is  the  choice  in 
noisy  environments,  implementing  the  best  linear  detec¬ 
tor  of  the  transmitted  symbols  under  a  minimum  mean- 
square  error  criterion.  The  result  is  an  estimator  that  im¬ 
proves  the  conditional  ML  ( CML)  solution  when  noise  is  not 
negligible,  and  attains  the  derived  Gaussian  Unconditional 
Cramer-Rao  Bound  (UCRB)  in  the  whole  Eh  No  range  as 
classical  Gaussian  Unconditional  ML  (UML)  does.  Conse¬ 
quently,  the  proposed  MCV  estimator,  becomes  an  optimal 
quadratic  solution  achieving  similar  features  than  UML  in  a 
straightforward  way,  and  with  no  assumptions  on  the  signal 
statistics. 


1.  INTRODUCTION 

In  digital  communications,  the  knowledge  of  certain  param¬ 
eters  as  for  example  the  phase  and  carrier  frequency  or  the 
propagation  delay,  are  paramount  to  get  a  reliable  detec¬ 
tion  of  the  transmitted  symbols.  Focusing  on  multi-user 
DS-CDMA  systems,  an  accurate  estimation  of  the  propa¬ 
gation  delays  for  all  users  is  essential.  Otherwise,  the  per¬ 
formance  of  the  multi-user  detector  is  rapidly  decreased  by 
means  of  multiple-access  interference  (MAI),  as  has  been 
widely  studied  in  the  literature  [1],  [2].  Accordingly,  this 
paper  addresses  a  multi-parametric  estimator  intended  for 
the  multi-user  synchronization  and  symbol  detection,  with 
high  performance  in  low  SNR  scenarios.  Nevertheless,  the 
proposed  algorithm  is  not  restricted  to  multi-user  synchro¬ 
nizers,  and  can  be  also  extended  to  other  estimation  prob¬ 
lems,  like  frequency  synchronization  in  OFDM  and  Multi- 
Carrier  schemes. 

Maximum  Likelihood  (ML)  formulation  has  been  usu¬ 
ally  employed  to  design  timing  estimators.  Classically,  Un¬ 
conditional  ML  (UML)  algorithms  have  been  developed  in 
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the  field  of  digital  communications  modeling  the  transmit¬ 
ted  symbols  as  stochastic  processes.  Nevertheless,  in  order 
to  obtain  feasible  mathematical  expressions,  UML  estima¬ 
tors  make  some  assumptions  on  the  gaussianity  of  signal 
statistics,  which  is  known  to  be  a  non-realistic  assump¬ 
tion  in  digital  communications,  or  assumptions  on  low  SNR, 
which  leads  to  self-noise  appreciable  when  the  noise  term 
is  negligible.  Consequently,  the  restrictions  on  UML  moti¬ 
vated  the  introduction  of  deterministic  or  conditional  ML 
( CML ),  which  considers  the  transmitted  symbols  as  deter¬ 
ministic  unknown  parameters.  This  formulation  has  been 
applied  by  Stoica  and  Nehorai  [3]  in  sensor  array  process¬ 
ing  to  perform  DOA  estimation,  and  more  recently  the  same 
principle  has  been  applied  to  frequency  and  timing  estima¬ 
tion  [4]-[7].  The  CML  solution  does  not  present  self-noise, 
is  robust  in  near-far  scenarios,  and  provides  a  high  perfor¬ 
mance  at  high  SNR' s.  Nevertheless  it  is  not  an  optimal 
solution  in  noisy  scenarios  with  low  SNR. 

The  proposed  Minimum  Conditioned  Variance  (MCV) 
method,  addressed  in  this  paper,  mitigates  the  CML  esti¬ 
mation  drawbacks  at  low  SNR  scenarios  considering  the  im¬ 
pact  of  the  noise,  and  becomes  the  deterministic  solution  at 
high  SNR.  Although  the  derived  MCV  becomes  biased,  the 
bias  value  can  be  estimated  and  next  subtracted  to  obtain 
an  unbiased  estimator.  The  result  is  an  estimator  that  at¬ 
tains  the  lower  Gaussian  Unconditional  Cramer-Rao  Bound 
UCRB  in  the  whole  EbNo  range,  as  Gaussian  UML  does. 
Accordingly,  MCV  becomes  an  optimal  quadratic  estimator 
with  no  assumptions  on  the  signal  statistics. 

This  paper  is  organized  as  follows.  Next  section  de¬ 
scribes  the  discrete-time  signal  model,  and  obtains  a  struc¬ 
tured  matrix  expression  containing  the  parameters  to  esti¬ 
mate.  Section  3  describes  the  CML  formulation  and  justifies 
under  which  conditions  the  deterministic  criterion  does  not 
become  feasible.  Afterwards,  section  4  introduces  the  Min¬ 
imum  Conditioned  Variance  method  as  choice,  and  derives 
its  gradient  expression.  Furthermore,  a  detailed  study  of 
the  proposed  estimator  shows  it  is  biased  and  consequently 
a  modified  unbiased  estimator  is  proposed.  Next,  section  5 
derives  the  UCRB  which  is  used  as  a  benchmark,  at  high 
and  low  SNR’s,  to  the  performance  of  the  proposed  multi¬ 
user  delay  estimator.  Finally  last  section  presents  some 
simulation  results  proving  the  proposed  MCV  outperforms 
CML,  attaining  the  UCRB  and  reducing  the  Bit-Error  Rate 
BER  in  symbol  detection. 
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2.  DISCRETE-TIME  SIGNAL  MODEL 

The  described  model  considers  a  K  user  asynchronous  DS- 
CDMA  system  operating  in  a  multipath  environment.  The 
received  signal  contains  the  superposition  of  K  active  users: 

K 

r{t)  =  Y^sk{t-Tk)  +  w(t)  (1) 

k=  1 

where  sk  (t)  denotes  the  fc-user  received  baseband  signal,  rk 
its  the  propagation  delay,  and  w(t )  represents  the  received 
AWGN  noise  term  with  zero  mean  and  variance  <r£,. 

For  each  user  the  received  baseband  signal  is  modeled  as: 

OO 

sk{t)=  Yj  dke3$k gk(t  -  nT)  (2) 

n=— oo 

where  gk(t)  represents  the  fc-user  received  signature,  T  is 
the  bit  duration,  dk  are  the  transmitted  information  bits, 
and  9k  the  received  carrier  phase.  Moreover,  considering 
the  presence  of  a  propagation  multipath  channel  with  base¬ 
band  impulse  response  hk(t),  the  fc-user  received  signature 
is  given  by  a  distorted  version  of  the  transmitted  spreading 
waveform  ck(t)  as: 


On  the  other  hand  the  model  transfer  matrix,  denoted  as 
A(r)»,  contains  the  user  signatures,  and  the  parameters  to 
estimate  Tk: 

A  =  A(r)  =  [A^n)  A2(t2)  ...  Ak  (tk)  ]  (9) 

A  (Tk)  =  [aj  ak  ...  ajv4_i  ] 

an  =  [gk(-MTa  -  nT  —  Tk)  ... 
gk(MTa-nT-Tk)]T 

where  the  columns  of  A k(Tk)  are  scrolled  versions  of  the 
fc-user  signature  delayed  Tk. 

A  more  detailed  model  of  matrix  Ak(rk)  will  be  constituted 
by  the  product  of  two  matrices: 

A  k(rk)  =  H*(hk)Cfc(T*)  (10) 

Matrix  Hfc(h|<)  is  a  Sylvester  or  convoluting  matrix  model¬ 
ing  the  channel  distortion,  whose  columns  are  the  fc-user  im- 
pulsional  channel  response  coefficients.  On  the  other  hand, 
matrix  C k(Tk)  will  be  obtained  by  the  fc-user  spreading  code 
delayed  r*. 

3.  THE  CML  FORMULATION 


gk(t)  =  ck(t)*hk(t)  (3) 

Finally  the  received  signal  as  a  function  of  the  user’s  signa¬ 
tures  is  given  by: 

K  oo 

r(t)  =  Y  Y  dknel6kgk(t  -  nT  -  rk)  +w(t)  (4) 

k= 1  n—— oo 


The  algorithm  is  derived  in  a  discrete-time  signal  model  by 
sampling  the  received  waveform  at  Nsc  samples  per  chip. 
Choosing  the  sampling  frequency  as  /„  =  1/TS,  where  Ts 
is  the  sampling  period,  and  collecting  2 M  +  1  samples  of 
r(nTs),  the  vector  r  can  be  defined  as: 

r  =  [  r(—MTs)  ...  r(0)  ...  r(MTs)]T  (5) 

At  this  point  equation  (4)  can  be  expressed  following  the 
matrix  signal  model: 

r  =  A(r)x  +  w*)  (6) 

The  set  of  unknown  parameters  (i.e.  the  transmitted  sym¬ 
bols  and  phase  errors)  for  fc-user  define  the  vector  xk: 

xk=[dk_Lej0k  ...  d%ej0k  ...  dkLej9k  ]T  (7) 

where  the  number  of  transmitted  symbols  N„  =  2L  +  1. 
Finally,  stacking  all  users,  the  nuisance  parameter  vector  x 
is  defined  as  follows: 

X=  [  X1T  xzT  ...  XKT  ]T  (8) 

*lThe  channel  coefficients  are  assumed  to  be  known  or  previ¬ 
ously  estimated  (e.g.  [8]) 


The  cost  function  in  CML  estimation  for  the  signal  model 
in  (6)  is  derived  from  the  joint  ML  cost  function  that  is 
formulated  as: 


A(r/r,  x) 


1  -  1  ||r_Ax||2 

■  I  -I  P  ^ xv 

(7T<t2,)M 


(11) 


The  ML  function  depends  on  the  parameter  estimation  vec¬ 
tor  r  and  also  on  the  vector  x.  Notice  that  vector  x  contains 
the  set  of  unknown  parameters  and  thus  it  is  necessary  to 
take  some  considerations  on  this  vector.  The  joint  r,x  es¬ 
timation  could  be  the  solution,  but  it  is  discarded  because 
it  is  computationally  complex,  and  alternative  algorithms 
only  focusing  on  the  r  vector  estimation  are  proposed.  Clas¬ 
sically,  UML  solution  computes  the  expectation  of  the  joint 
ML  function  with  respect  to  the  nuisance  parameters: 


Auml(t/t)  =  Ex  {A(r/r, x)}  (12) 


In  general  the  expectation  Ex  in  (12)  is  quite  difficult  to 
obtain,  and  in  practice  only  an  approximation  of  the  likeli¬ 
hood  function  in  low  SNR  scenarios  is  approached. 
Previous  limitations  motivate  the  use  of  the  CML  solution. 
This  method  considers  the  nuisance  parameters  as  deter¬ 
ministic,  and  thus  they  can  be  substituted  by  its  estimation 
keeping  fixed  r  vector.  The  ML  estimation  of  x,  when  no 
restrictions  are  imposed  on  it,  can  be  obtained  as: 

Xml  =  A#r  (13) 

where  A*  is  the  Moore-Penrose  pseudo-inverse.  Once  the 
nuisance  vector  x  is  estimated,  the  compressed  ML  function 
to  maximize,  which  only  depends  on  the  parameter  vector 
r,  is  obtained  by  replacing  (13)  in  (11).  And  finally  the 

^Hereafter  the  dependence  on  vector  r  will  be  suppressed  for 
simplicity 
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derived  log-likelihood  function  to  minimize  (omitting  irrel¬ 
evant  constants)  is  given  by: 

min Lcml (r/r)  =  tr  {Pi R.}  (14) 

where  Pi  =  I  -  AA#  is  the  projection  matrix  onto  the 

orthogonal  subspace  defined  by  A,  and  R  =  rr  . 

To  minimize  (14)  a  gradient  algorithm  may  be  used.  The 
gradient  in  conditional  ML  was  derived  by  Viberg,  Otter- 
sten  and  Kailath  [9]  in  the  context  of  array  processing  for 
DOA  estimation.  In  our  delay  estimation  problem  this  gra¬ 
dient  can  be  expressed  as: 

gCi  =  -2 Re  {  (rHPiD;)  (A#r)  }  (15) 

where  Dj  =  ^?A. 

A  more  accurate  study  of  the  gradient  expression  shows 
that  it  is  computed  by  the  product  of  two  terms.  The  first 
term  is  (r77  P^D,)  and  justifies  the  proposed  algorithm  to 
be  self-noise  free.  Considering  a  noiseless  environment,  and 
the  absence  of  delay  errors,  vector  r  will  be  contained  in 
the  signal  subspace  generated  by  the  A  matrix  columns. 
Thus,  the  projection  matrix  Pj[ ,  which  does  not  appear  in 
the  classical  unconditional  approach,  acts  as  a  zero-forcer 
placed  at  the  output  of  the  derivative  matched  filter  D;.  As 
a  result,  the  estimator  ensures  in  all  cases  a  self-noise  free 
solution:  (r^P^D;)  =  0. 

The  second  term  (A#r)  corresponds  to  the  ML  esti¬ 
mation  of  the  unconstrained  vector  x.  Notice  that  this  ex¬ 
pression  is  the  decorrelating  detector  solution,  so  the  algo¬ 
rithm  not  only  estimates  the  propagation  delay  but  also  im¬ 
plements  this  sub-optimum  detector.  The  presence  of  this 
term  justifies  the  proposed  solution  to  be  a  robust  near-far 
estimator.  Analyzing  the  signal  model  (6)  it  is  observed 
that  the  received  powers  can  be  introduced  in  the  nuisance 
parameter  vector  x.  Hence,  following  (13)  it  is  guaranteed 
that  the  algorithm  will  estimate  the  received  power  values, 
justifying  the  estimator  to  be  insensitive  to  different  power 
levels. 

Nevertheless,  the  decorrelating  detector  evidences  some  dif¬ 
ficulties  in  noisy  scenarios.  The  pseudoinverse,  as  the  ideal 
zero-forcing  solution  ZF  in  equalization,  does  not  take  into 
account  the  noise  term.  Accordingly,  when  the  transfer  ma¬ 
trix  A  eigenvalue  spreading,  defined  as: 


(16) 


is  large  enough,  the  noise  term  will  be  extremely  increased, 
becoming  the  CML  method  an  unacceptable  solution  in  low 
SNR  scenarios,  which  are  common  in  wideband  DS-CDMA 
systems. 


4.  MINIMUM  CONDITIONED  VARIANCE 
APPROACH 

A  novel  approach  is  proposed  in  this  paper  considering  the 
impact  of  the  noise  in  the  likelihood  function,  achieving 
in  consequence  a  more  robust  estimator  in  low  SNR  scenar¬ 
ios.  The  Minimum  Conditioned  Variance  approach  ( MCV ) 
makes  the  nuisance  parameter  estimation  as  the  best  linear 


estimation  under  a  minimum  variance  criterion  given  an 
observation  vector  r.  This  estimation  is: 

x  =  E  [x/r]  =  TAH(ArAH  +  o*  ly'r  =  Cr 

C  =  TAH(ArAH  +  crll)-1  (17) 

r  —  e  |xxH} 

Previous  expression  belongs  to  the  best  linear  and  non¬ 
linear  estimator  under  Gaussian  conditions,  and  only  the 
best  linear  estimator  under  non-Gaussian  conditions.  The 
new  cost  function  is  derived  by  substituting  (17)  in  equation 
(11)  and  it  is  given  by: 

mmLMCv(r/r)  =  ||r  -  ACr||2  (18) 

r 

At  high  SNR  scenarios  C (<72_>0)  =  A#  is  the  pseudo¬ 
inverse  of  A,  becoming  the  CML  solution.  On  the  other 
hand,  when  the  contribution  of  ATA77  is  negligible  in  front 
of  all,  C  approaches  a  bank  of  matched  filters  containing 
all  the  user  signatures:  C(cr2,_>00)  =  <7,~2rA77 .  This  second 
limit  is  achieved  at  low  SNR  when  the  noise  power  is  much 
greater  than  the  received  signal  power  for  all  users.  No¬ 
tice  however  that,  in  high  near-far  scenarios,  the  elements 
in  r  associated  to  the  most  powerful  users  will  be  higher 
than  the  noise  term.  Consequently,  in  scenarios  with  low 
SNR  and  small  near-far,  the  MCV  will  improve  the  classi¬ 
cal  CML  solution,  whereas  in  high  near-far  scenarios,  MCV 
will  remain  close  to  CML 

.  To  minimize  (18)  we  will  follow  once  again  a  gradient 
scheme.  The  gradient  expression  in  MCV  is  given  by: 

gmcVi  =  -2 Re  {rH  (I  -  AC)"  (d<C  +  A^c)  r}  (19) 

It  results  interesting  to  analyze  the  behaviour  at  high  and 
low  SNR  scenarios.  At  high  SNR  C  -4  A#,  and  making 
use  of  Pjj  A  =  0,  the  second  term  in  the  previous  gradient  is 
asymptotically  equal  to  zero:  rH  (I  —  AC)H  A^-Cr  =  0. 
Thus  the  gradient  becomes: 

gm cVi  (d  -4  0)  ~  -2 Re  {rH  (I  -  AC)H  (DiC)  r}  (20) 

Likewise,  at  low  SNR's  C  -4  er_2rA77,  and  the  two  com¬ 
ponents  in  the  gradient  (19)  supply  the  same  value.  Hence, 
the  asymptotic  gradient  derived  in  noisy  environments  cor¬ 
responds  with: 

gm cvt  (d  -4  oo)  ~  -4 -Re  {rHDirAHr}  (21) 

O’  uj 

Notice  that  the  second  term  can  be  dropped  in  both  cases 
without  loosing  information  by  the  gradient. 

Finally,  for  the  special  case  when  there  is  only  one  parame¬ 
ter  to  estimate,  e.g.  timing  or  frequency  estimation  in  linear 
and  non-linear  modulations,  another  argument  to  eliminate 
the  second  term  is  detailed  in  [7]  and  next  outlined.  Consid¬ 
ering  that  vector  r  follows  a  Gaussian  distribution,  which  is 
known  to  be  a  non-realistic  assumption  in  digital  commu¬ 
nications,  the  Gaussian  UML  cost  function  becomes 

LUMlo  (r/r)  =  r^r'r 
_ R=_  (ATAh  +  a2wl) 

Donly  applicable  if  AH  A  does  not  depend  on  the  parameter 
to  estimate 
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and  the  Gaussian  UML  gradient  in  previous  equation  is 
given  by: 

9umi{  =  -2Re{r"(I-AC)"DiCr}  (23) 

Comparing  last  equation  with  (19),  a  further  justification 
for  removing  the  second  term  is  obtained.  Accordingly,  in 
uni-parametric  estimators,  and  assuming  a  Gaussian  distri¬ 
bution  for  the  transmitted  symbols,  the  MCV  gradient  be¬ 
comes  the  Gaussian  UML  gradient.  Nevertheless,  in  multi- 
parametric  estimators,  the  Gaussian  UML  cost  function  be¬ 
comes  more  complex: 

lumlo{t/t)  =  ln|R|  +  r"R_1r  (24) 

Notice  a  new  term  ln|R|,  which  becomes  constant  in  the 
uniparametric  estimators  when  A"  A  does  not  depend  on 
the  parameter  to  estimate,  is  introduced.  The  gradient  ex¬ 
pression,  derived  in  [3]  cannot  be  identified  with  (19)  any¬ 
more. 

After  the  previous  analysis,  the  MCV  gradient  can  be 
asymptotically  rewritten  as: 

gmcvi  «  — 2i?e  {r"  (I  -  AC)"  DiCr}  (25) 

A  more  accurate  analysis  of  the  previous  gradient  shows 
it  is  biased.  It  can  be  seen  that  in  the  absence  of  timing 
errors  the  gradient  does  not  become  the  null  vector.  There¬ 
fore,  the  bias  expression  can  be  obtained  computing  the 
gradient  expected  value  when  the  estimated  timing  vector 
equals  the  real  timing  vector: 


where: 

R  =  E{rrH}  =  ATAH+a2wI 


(30) 


Focusing  on  our  estimation  problem,  assuming  that  the 
noise  power  is  a  priori  known  (which  is  considered  in  the 
MCV  case),  and  modeling  the  transmitted  symbols  to  be 
zero  mean  independent  random  variables  (i.e.  T  is  a  diag¬ 
onal  matrix)  R,  results: 


Ri  =ait  (D4A"  +  AD")  i  =  1 . . . Ns 


D  —  A 

L>1  -  driA 


(31) 


which  can  be  substituted  into  (29)  to  obtain  the  UCRB. 


6.  SIMULATION  RESULTS 

To  evaluate  the  CML  (15)  and  MCV  (28)  estimators  its  per¬ 
formance  was  compared  computing  the  Root-Mean  Square 
Error  ( RMSE )  in  the  timing  delay  estimation,  and  the  Bit¬ 
error  Rate  ( BER )  in  the  symbol  detection.  Simulations 
were  done  considering  5  users,  the  spreading  codes  were 
Gold  sequences  with  7  chips  per  bit,  the  pulse  shaping  was 
a  square-root  raised  cosine  pulse  with  roll-off  factor  equal  to 
0.5  and  the  considered  modulation  was  BPSK,  and  the  over- 
sampling  factor  was  N3C  =  2.  Denoting  BL  as  the  equiv¬ 
alent  noise  loop  bandwidth,  this  parameter  is  related  with 
the  number  of  transmitted  symbols  as  [12]: 


Biasi  —  E  {gmcvi }  |f=r 

=  -2Re{Tr{TA”  (I  -  ArCr)"  Dir  }  } 

(26) 

denoting  Ar,  Cr,  Dir,  the  dependence  of  matrices  on  r. 
Unfortunately  previous  expression  cannot  be  computed  by 
the  estimator  because  the  real  timing  vector  r  is  not  a  priori 
known.  Nevertheless,  the  gradient  expected  value  close  to 
the  real  timing  vector  does  not  depend  on  the  absolute  tim¬ 
ing  error  r  —  r.  Hence,  an  accurate  bias  estimation  can  be 
obtained  if  the  estimated  timing  vector  is  used  to  compute 
(26): 

BRTsi  =  -2 Re  {Tr  {TA?  (I  -  Af Cf)H  Dif  }}  (27) 

As  a  result,  an  unbiased  estimation  of  r  vector  can  be 
obtained  according  to  a  modified  gradient  where  the  bias  is 
subtracted: 

gmcl\a3ed  =  -2 Re  {r"  (I  -  AC)"  DiCr}  -  Bi^Si  (28) 


5.  PERFORMANCE  ANALYSIS 

This  section  derives  the  Gaussian  Unconditional  Cramer- 
Rao  Bound  {UCRB)  to  compare  it  with  the  proposed  CML 
and  MCV  multi-user  delay  estimators  analyzing  its  perfor¬ 
mance.  As  it  is  shown  in  [3]  the  UCRB  is  a  valid  lower 
bound  for  the  variance  of  any  consistent  estimator  based 
on  the  data  sample  covariance  matrix. 

As  derived  in  [10]  the  yth  Fisher  Information  Matrix  ( FIM ) 
element  can  be  obtained  as: 

{ FIMU }y  =  Tr  {R-1RiR-1Rj }  (29) 


BlT  = 


1 

2  Ns 


(32) 


and  the  U CRB  lower  bound  is  usually  written  as  a  function 
this  bandwidth  factor. 


Figure  1  compares  the  proposed  MCV  versus  the  clas¬ 
sical  CML  algorithm,  and  compares  the  RMSE  with  the 
derived  UCRB  lower  bound  assuming  that  the  noise  power 
cr(;,  is  a  priori  known  (29)  -  (31).  A  low  SNR  scenario  with 
near-far  NF= 0  and  only  one  path  per  user  on  AWGN  (i.e. 
no  channel  assumption)  was  simulated.  As  it  can  be  seen  in 
figure  1,  due  to  the  high  eigenvalue  spread,  at  low  SNR  the 
CML  is  not  an  optimal  solution  and  does  not  achieve  the 
derived  UCRB.  Under  those  conditions,  the  proposed  MCV 
outperforms  the  CML  algorithm  and  attains  the  UCRB, 
becoming  a  quadratic  optimal  solution.  Figure  1  also  il¬ 
lustrates  how  at  high  SNR  the  MCV  becomes  the  CML 
solution,  and  asymptotically  both  attain  the  Cramer-Rao 
Bound. 

A  second  simulation  shows  the  performance  of  both  al¬ 
gorithms  in  symbol  detection,  and  illustrates  once  again  the 
importance  of  MCV  in  noisy  environments.  Figure  2  com¬ 
pares  the  BER  according  to  the  ML  estimation  of  vector 
x  (13)  considered  in  CML  estimation,  and  the  MMSE  es¬ 
timator  (17)  introduced  in  the  MCV  approach.  In  order 
to  illustrate  the  eigenvalues  spread  importance,  two  simu¬ 
lations,  using  7  chips  per  bit  spreading  codes  (associated 
eigenvalue  spreading  \  :  35)  and  15  chips  per  bit  spreading 
codes  (associated  eigenvalue  spreading  x  ■  6.25),  were  done. 
As  it  can  be  seen,  the  higher  the  eigenvalue  spreading  is, 
the  worse  the  CML  solution  performs.  When  the  system 


690 


Performance  Analysis 


Figure  1:  Timing  Delay  Estimation  Error 


is  working  at  the  limit  of  its  capacity,  (i.e.  5  users  and 
spreading  factor  7)  the  noise  power  is  extremely  increased 
by  the  decorrelating  detector,  and  CML  is  not  an  accept¬ 
able  solution,  while  the  novel  MCV  always  achieves  a  better 
performance. 

7.  CONCLUSIONS 

In  this  paper  the  MCV  algorithm  has  been  introduced  in 
the  multiuser  propagation  delay  estimation  context.  This 
novel  method  modifies  the  classical  CML  solution  consider¬ 
ing  the  impact  of  the  noise  in  the  Likelihood  function  com¬ 
pression.  Hence,  a  more  robust  algorithm  in  noisy  environ¬ 
ments  when  the  transference  matrix  eigenvalue  dispersion 
is  large,  can  be  derived. 

Simulations  have  shown  MCV  outperforms  the  classical 
deterministic  algorithm  in  noisy  conditions,  and  it  corre¬ 
sponds  asymptotically  with  the  CML  at  high  SNR' s.  The 
mean  squared  timing  error  and  the  bit-error  rate  at  the  sym¬ 
bol  detection  have  been  used  to  evaluate  this  performance. 
Accordingly,  the  suggested  quadratic  estimation  technique 
is  shown  to  be  optimal  since  it  attains  the  UCRB  lower 
bound  in  the  whole  EbNo  range,  becoming  a  great  substi¬ 
tute  not  only  to  CML,  but  also  to  UML  because  it  achieves 
similar  features  in  a  straightforward  way. 
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ABSTRACT 

Most  digitally  modulated  signals  can  be  depicted  by 
signal  space  diagram.  Such  signal  constellation  often 
shows  certain  properties.  M-PSK  modulated  signals 
uniformly  lie  on  a  circle.  To  equalize  this  class  of  sig¬ 
nals,  the  constant  modulus  algorithm  (CMA)  is  very 
efficient  due  to  signals’  constant  modulus  property.  Its 
criterion  penalizes  deviations  in  the  amplitude  of  equal¬ 
ized  signals  from  a  fixed  value  while  ignoring  the  uni¬ 
formly  distributed  phase.  In  this  paper  we  explore  both 
amplitude  and  phase  properties  in  the  equalization  con¬ 
text.  By  combining  dispersion  of  both  the  amplitude 
and  phase  value  in  one  cost  function,  new  criteria  for 
blind  equalization  are  obtained.  Comparisons  between 
the  proposed  methods  and  the  CMA  algorithms  are 
made  based  on  the  level  of  inter-symbol  interference 
and  the  probability  of  detection  error. 

1.  INTRODUCTION 

In  digital  communication,  the  user’s  information  stream 
is  usually  modulated  before  transmission.  Due  to  mul¬ 
tipath  propagation,  inter-symbol  interference  (ISI)  is 
introduced  in  the  received  signal.  A  equalizer  is  re¬ 
quired  to  counter  the  effect  of  channel  distortion.  When 
source  alphabets  form  a  M-PSK  constellation,  the  con¬ 
stant  modulus  algorithm  (CMA)  shows  much  efficiency 
in  eliminating  the  interference  due  to  other  undesired 
symbols.  An  excellent  review  about  this  algorithm  can 
be  found  in  a  recent  paper  [5].  After  the  algorithm 
was  developed  by  [3]  and  [8],  it  has  been  extensively 
studied.  The  convergence  property  of  CMA  has  been 
analyzed  [2].  Connections  between  CMA  and  Wiener 
receivers  are  also  built  based  on  a  novel  geometrical 
concept  [4],  It  has  been  proved  that  the  zero  cost  can 
be  achieved  by  this  criterion  under  some  conditions  [5]. 

As  is  well  known,  the  constant  modulus  criterion 
employs  the  constant  modulus  (amplitude)  property  of 
modulated  signals.  In  fact  besides  the  property  for 
the  amplitude,  most  digitally  modulated  signals  also 


show  other  features  such  as  uniform  distribution  on  a 
plane  in  discrete  intervals  (quadrature  amplitude  mod¬ 
ulation),  on  a  unit  circle  (phase  modulation)  or  on  the 
real  axis  (amplitude  modulation)  in  the  signal  space. 
These  information  will  help  equalize  the  channel,  as 
employed  in  [1],  [6]  to  match  the  signal  constellation. 
Generally  for  a  complex  signal,  it  is  described  by  its 
amplitude  together  with  its  phase.  In  this  paper  we  fo¬ 
cus  on  M-PSK  modulated  signals.  Properties  of  other 
modulated  signals  can  be  similarly  captured. 

An  M-PSK  signal  s  can  be  represented  by  its  con¬ 
stant  amplitude  ro  and  uniformly  distributed  phase 
where  m  is  a  random  number  taking  val¬ 
ues  0,  1,  •  •  •,  M  -  1  with  equal  probability.  It  can 
be  shown  that  the  fc-th  order  moment  of  this  signal  is 
zero  for  all  k  except  that  k  is  a  multiple  of  M  where  it 
becomes  a  constant.  Thus  the  moment  instead  of  the 
absolute  moment  contains  sufficient  phase  information. 
If  we  consider  the  M-th  order  moment  of  the  equalized 
signal,  then  its  norm  square  can  be  maximized  to  ob¬ 
tain  the  equalizer  under  certain  constraint.  This  con¬ 
strained  maximization  problem  can  also  be  converted 
into  a  constrained  minimization  problem  based  on  our 
analysis. 

In  these  approaches,  the  amplitude  and  phase  of 
the  equalized  signal  are  jointly  taken  into  account  im¬ 
plicitly.  However,  explicit  consideration  is  possible  by 
integrating  the  phase  property  into  the  CMA  cost  func¬ 
tion.  It  can  be  easily  observed  that  the  phase  constraint 
=  mn  is  equivalent  to  sin^Q)  =  0.  Thus  in  or¬ 
der  to  achieve  a  better  equalization  performance,  this 
phase  deviation  should  also  be  minimized.  Based  on 
these  observations,  another  criterion  can  then  be  de¬ 
veloped  by  combining  it  with  the  CMA  cost  function. 
Similar  to  CMA  algorithm,  equalizers  corresponding  to 
different  approaches  can  be  recursively  updated  using 
the  stochastic  gradient  ascent/descent  method.  Simu¬ 
lation  results  are  presented  to  compare  the  proposed 
equalizer  with  CMA  equalizer  based  on  ISI  and  proba¬ 
bility  of  detection  error. 
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2.  PROBLEM  STATEMENT 

Consider  a  widely  adopted  input/output  model  in  wire¬ 
less  communications  and  blind  equalization  [4] 

x(n)  =  Hs(n)  -I-  w(n )  (1) 

where  s(n)  €  Cm  is  the  complex  source  vector  from 
M-PSK  modulation  constellation,  H  €  Cpym  is  the 
channel  matrix,  w{n)  €  Cp  represents  additive  white 
Gaussian  noise  (AWGN),  a:(n)  €  Cp  is  the  received 
signal.  The  equalization  is  performed  by  designing  an 
equalizer  f  €  Cp  whose  output  yn  is  expected  to  be  an 
accurate  estimate  of  one  of  the  elements  in  s 

Vn  =  fHx(n )  =  aTs{n )  +  fHw{n )  (2) 

where  (■)T,  (-)H  stand  for  transpose  and  Hermitian, 
aT  =  j  H  is  the  combined  response  of  the  channel 
and  the  equalizer.  Perfect  equalization  can  be  achieved 
in  the  absence  of  noise  if  a  has  only  one  non-zero  ele¬ 
ment  [7] 

a  =  ejfl[0,  •  •  •  ,0, 1,0,  •  •  •  ,0]T  (3) 

The  position  of  the  non-zero  element  in  a  stands  for 
the  delay  and  6  is  the  phase  shift.  Therefore  the  delay 
and  phase  ambiguity  are  inherent  in  blind  equalization. 
Different  criteria  can  be  used  to  obtain  the  equalizer. 
The  CMA  criterion  seeks  to  minimize  the  dispersion  of 
the  equalizer  output  about  a  constant  r 


uniform  phase  values  result  in  the  following  represen¬ 
tation  of  the  modulated  signals 


s  =  ej4>,  $  =  m  =  0, 1,  •  •  • ,  M  —  1  (6) 

where  m  is  a  random  number  taking  M  possible  values 
with  equal  probability  jj-  M  is  usually  chosen  to  be 
even  M  —  2 L.  By  simple  calculation,  it  can  be  verified 
that  the  A;-th  order  moment  of  s  satisfies 


£{s*}  = 


T7W 


l 


=  o 


k^lM 

k  =  lM 


(7) 


Therefore  based  on  (7)  and  the  i.i.d.  assumption  of 
Si,  the  M-th  order  moment  of  the  equalizer  output  in 
the  absence  of  noise  is  related  to  the  combined  impulse 
response  by 

E{y™}  =  E{C£ats 

i 

=  =  £a<M  (8) 
i  i 

with  all  cross  terms  zeroed  out  in  the  transition  from 
the  first  line  to  the  second  line.  Similarly  we  can  obtain 
the  output  power  [7] 


E{\yn\2}  =  E{\J2^n-i\2} 

i 

=  53h|2E{K|2}-^la'|2  (9) 

/  i 


Jc(f)  =  E{(\yn\2  —  r)2}  (4) 

where  “j E”  represents  expectation.  The  constant  can 
be  chosen  as  r  =  P]»  [7]*  F°r  M-PSK  signals, 

r  =  Tq  .  Due  to  high  non-linearity  of  the  cost  function, 
the  algorithm  is  usually  implemented  by  stochastic  gra¬ 
dient  descent  method 

f(k  +  1)  =  f(k)  -  (i{\yk |2  -  r)y*kx{k)  (5) 

where  *  represents  conjugate.  It  has  been  proved  that 
zero  cost  can  be  achieved  under  some  conditions  on 
the  source,  channel  and  additive  noise  [5].  Since  the 
phase  characteristic  of  M-PSK  signals  is  not  captured 
in  the  CMA  cost  function,  we  will  develop  new  criteria 
to  jointly  consider  the  properties  of  constant  modulus 
and  uniformly  distributed  phase  values. 

3.  DEVELOPMENT  OF  THE  CRITERIA 

M-PSK  signals  are  uniformly  distributed  on  a  circle 
with  radius  r0.  Without  loss  of  generality,  we  assume 
r0  =  1.  This  constant  modulus  property  together  with 


Equations  (8)  and  (9)  form  the  basis  to  the  following 
theorems. 

Theorem  1:  If  E{\yn\2}  =  1,  then  \E{y™}\2  <  1. 
The  equality  holds  if  and  only  if  a  takes  the  form  (3). 
Proof:  We  apply  the  norm  property  first 

\E{vn)\  =  i  £a<Mi  <  E  =  E(ia'i2)L 

i  i  i 

For  the  above  equality  to  hold,  of1  should  be  non¬ 
negative  numbers,  or  all  a;  are  zero  except  one  is  non¬ 
zero.  Since  for  real  numbers  |oj|2,  we  have 

E(bi2)L<(Eia'i2)£  =  (^i2»L  =  1 

i  i 

The  equality  holds  if  and  only  if  |o(|  =  1  for  one  l 
( l0 )  while  \ai\  =0  for  all  the  rest.  Combining  these 
conditions  we  have  a/0  =  .  □ 

Theorem  2:  If  \E{y*f}\  =  1,  then  E{\yn\2}  >  1. 
The  equality  holds  if  and  only  if  a  takes  the  form  (3). 

Proof:  The  proof  is  similar  to  that  in  Theorem  1. 
The  following  can  be  easily  verified 

E{\yn\2}  =  £  H2  =  [(£  M2)L]*  >  (£  MM)* 

i  i  i 
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After  considering  (2),  the  Lagrange  cost  function  can 
be  built  as 


i 

The  equality  holds  if  and  only  if  a/0  =  eje  for  a  partic¬ 
ular  l0.  □ 

Theorem  1  and  2  suggest  the  following  equalization 
criteria  respectively. 

3.1.  The  first  criterion 

According  to  Theorem  1,  the  equalizer  f  can  be  ob¬ 
tained  by 

max  \E{y^f  }|2,  subject  to  E{\yn\2}  =  1 


Mf)  =  fHRf  +  X2[E{{fHx)M}E{{xHf)M)  -  1] 

(15) 

with  a  new  multiplier  A2.  To  minimize  J2(/),  we  for¬ 
mulate  the  gradient  descent  recursion  for  the  equalizer 

f(k+i)=m-»2vj2(f)\f=f{k)  (i6) 

The  derivative  is  easily  computed  from  (15)  as 

VJ2(/)  =  Rf  +  A  2ME{(fHx)M-1x}  (17) 


After  substituting  yn  from  (2)  into  the  above,  we  can 
construct  the  Lagrange  cost  function  for  this  constrained 
maximization  problem 

Mf)  =  E{(fHx)M}E{(xH f)M}  +  MfHRf  -  1) 

„  .  (10) 
where  R  =  E{xxH},  X\  is  an  unknown  Lagrange  mul¬ 
tiplier.  To  seek  a  maximizer  of  J\(f),  the  gradient 
ascent  method  can  be  employed 

f(k  + 1)  =  f(k)  +  mVMf)  I  f=f{k)  (ii) 

where  /q  is  the  step  size.  Prom  (10),  the  derivative 
with  respect  to  fH  can  be  shown  to  be 

VJi(/)  =  ME{(fHx)M~1x}E{(xHf)M}  +  A  tRf 

(12) 

The  optimal  X\  is  the  one  which  makes  (12)  zero  un¬ 
der  the  constraint.  By  setting  (12)  equal  to  zero  and 
applying  our  constraint,  we  can  obtain  Ai 

Ar  =  -ME{(fHx)M}E{(xHf)M}  (13) 

Therefore  the  derivative  becomes 

VJiCf)  =  Mb*[E{(fHx)M-lx}  -  bRf]  (14) 

where  b  =  E{(fH x)M}.  Substituting  (14)  in  (11)  we 
obtain  our  recursion  for  the  equalizer.  To  estimate  ex¬ 
pected  values  in  (14)  from  the  data,  we  save  values  for 
( fHx)M ,  (fn x)M~lx  and  xxH  at  each  iteration,  and 
average  them  based  on  all  of  their  values  up  to  the 
current  iteration.  Simulation  results  show  that  this 
is  a  good  approximation  with  less  computations.  As 
can  be  observed,  complexity  of  this  algorithm  is  about 
0((p+M)p). 

3.2.  The  second  criterion 

By  examining  Theorem  2,  we  can  also  obtain  the  equal¬ 
izer  /  based  on 


Based  on  the  constraint,  A2  can  be  similarly  obtained 


A2  =  — 


fHRf 


M 


Therefore  the  derivative  (17)  becomes 

VJ2(/)  =  Rf  -  fHRfE{{fHx)M~lx) 


(18) 


(19) 


Substituting  (19)  in  (16)  and  using  the  technique  as  in 
the  previous  subsection  to  estimate  expected  values,  we 
can  finally  obtain  the  recursion  for  the  equalizer  from 
data  samples  only. 

The  computational  complexity  of  these  two  meth¬ 
ods  are  very  similar.  It  is  significant  if  M  is  large.  Next 
we  will  develop  an  alternative  criterion,  still  based  on 
the  observed  properties  of  M- PSK  signals  but  with  re¬ 
duced  complexity. 


3.3.  An  alternative  approach 

Let  us  revisit  the  representation  of  s  in  (6).  As  far 
as  the  signal  representation  on  the  signal  space  is  con¬ 
cerned,  the  phase  property  is  equivalent  to  sin(^f$)  — 
0.  Besides  thejmnstant  modulus  criterion,  the  devia¬ 
tion  of  sin(^-^)  from  zero  should  thus  be  minimized 
as  well,  where  $  is  the  phase  of  the  equalized  signal  yn. 
Therefore  we  may  construct  the  following  cost  function 

Mf)  =  E{{\fHx?  -  l)2}  +  7 E{sin2(~$)}  (20) 

where  the  first  term  is  from  Jc(f),  7  is  a  weighting 
factor.  By  minimizing  (20),  the  equalizer  can  be  ob¬ 
tained.  However,  it  is  a  highly  non-linear  function  of 
/.  Similarly  a  gradient  descent  method  has  to  be  used 

f(k  +  l)  =  f(k)-p3VJ3(f)\f=f{k)  (21) 

The  derivative  of  the  first  term  on  the  RHS  of  (20)  is 


min£{|2/„|2},  subject  to  \E{y™}\2  =  1 


di  =  2E{(\yk\2  -  1  )y*kx}  (22) 
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which  is  a  function  of  f.  However,  if  we  compute  the 
derivative  of  the  second  term,  we  find  that  it  requires 
the  derivative  of  $  (denoted  by  $/) 


d2  = 

A 

To  obtain  its  expression,  we  first  define  the  real  and 
imaginary  parts  of  yk  by  y\  and  y2  respectively.  Both 
of  them  can  be  expressed  by  / 


Vk=yi+jy2,  2/1  = 


fHx  +  XH  f 


2/2  = 


fHx  -  XHf 

2  j 


Then  $  is  related  to  f  by 


I  ,2/2  ,  2/fc  ~  24 

$  =  arctan —  =  arctan— - 


2/i  i{Vk+y*k) 

Therefore  $  /  can  be  shown  to  be 

a  xHf  1 

$/  =  — — —X  =  — — X 


2j|2/*|2 


2j2/fc 


(23) 


(24) 


Hence  d2  becomes 


,  M  „tsin(M$) 

d2  =  —E{ - i - x] 

1  2 Ik 

Based  on  (22)  and  (25),  finally  we  obtain  VJ3(f) 

Msin(M$) 


(25) 


VJ3(f)  =  E{2(\yk\2-l)ytx  + 


^JVk 


x}  (26) 


where  $  is  given  by  (23)  and  yk  by  (2).  Substituting 

(26)  in  (21)  and  using  instantaneous  approximation  for 
the  expected  values,  we  obtain  the  recursion  for  the 
equalizer 

f(k  +  1)  =  f(k)  -  fi3cx  (27) 

where 

c_  8j{\yk\4  -  \yk\2)  +  Msin(M% ) 

4  jyk 

(27)  shows  that  at  each  iteration  the  equalizer  is  ad¬ 
justed  by  the  scaled  data  vector.  This  algorithm  has 
complexity  about  0(p). 


4.  SIMULATIONS 


and  alternative  approach  for  blind  equalization  of  M- 
PSK  signals.  Two  different  measures  will  be  adopted. 
First,  inter-symbol  interference  (ISI)  is  used  to  demon¬ 
strate  the  convergence  of  the  algorithm 

ISI  _  Ei  \ai\2-WnaX 

where  aT  =  fH H,  \a\max  is  the  maximal  absolute 
value  of  all  elements  in  a.  Clearly,  when  a  has  only 
one  nonzero  component  as  in  (3),  ISI  =  0  which  is 
the  ideal  situation.  Small  ISI  indicates  the  proxim¬ 
ity  to  the  desired  response.  Secondly,  the  probability 
of  decoding  error  is  especially  meaningful  in  the  com¬ 
munications  context  and  also  serves  as  an  indicator  of 
convergence.  It  is  obtained  from  multiple  independent 
realizations  with  random  input  signals  and  defined  as 
the  percentage  of  accumulated  decoding  errors  among 
total  number  of  transmitted  symbols  up  to  the  current 
iteration. 

In  the  experiments,  we  consider  an  unknown  non¬ 
minimum  phase  channel  used  in  [7]  with  unit  sam¬ 
ple  response  truncated  at  i  =  3  as:  0  when  i  <  0, 
-0.4  when  i  =  0,  and  0.84  x  0.4i_1  when  i  >  0.  In¬ 
puts  are  4-PSK  signal  source  with  4  equiprobable  val¬ 
ues:  1,  -1,  +j,  -j.  The  step  size  y  is  set  to  0.005. 
We  use  a  12-tap  equalizer  with  the  initial  value  as 
[0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]T.  The  iteration  number  is 
set  to  5000.  All  the  experiment  results  are  obtained 
from  50  independent  realizations. 

In  the  first  experiment,  we  compare  the  proposed 
first  criterion  with  Shalvi’s  approach  [7],  The  first  400 
iterations  are  based  on  [7]  to  obtain  good  initialization 
for  both  methods.  The  average  ISI  is  plotted  in  Fig. 
1  after  400  data  points,  where  the  solid  line  represents 
the  proposed  method  while  the  dashed  line  for  Shalvi’s 
method.  As  expected,  the  proposed  method  converges 
to  a  much  lower  ISI  level  while  maintaining  almost  the 
same  fast  convergence.  Similar  result  can  be  observed 
from  Fig.  2  for  the  error  probability.  The  second  ex¬ 
periment  compares  the  proposed  alternative  approach 
with  the  CMA  algorithm  [3].  7  is  chosen  to  be  0.5. 
The  average  ISI  and  error  probability  are  plotted  in 
Fig.  3  and  Fig.  4  respectively.  Solid  lines  represent 
the  proposed  method  while  dashed  lines  for  CMA.  It 
can  be  observed  that  the  proposed  method  converges 
faster  than  the  standard  CMA  while  achieving  a  lower 
ISI  level  after  convergence.  The  error  probability  of  the 
proposed  method  is  also  much  lower  than  CMA. 


We  test  the  proposed  methods  and  make  comparisons 
with  two  typical  CMA  algorithms  [3]  [7]  by  computer 
simulations.  Due  to  lack  of  space  and  the  similarity 
between  the  first  and  the  second  proposed  criteria,  we 
only  present  the  simulation  results  of  the  first  criterion 
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Figure  2:  Error  probability  of  the  first  criterion  and 
Shalvi’s  method. 


Figure  3:  ISI  of  the  alternative  method  and  Godard’s 
method. 


Figure  4:  Error  probability  of  the  alternative  method 
and  Godard’s  method. 
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ABSTRACT 

Many  digital  communications  systems  use  symbol  constella¬ 
tions  such  as  M-ary  PAM,  PSK,  or  QAM,  which  are  simple 
to  implement  and  symmetric.  In  a  blind  setting,  equaliza¬ 
tion  of  inter-symbol  interference  (ISI)  caused  by  a  linear 
channel  with  unknown  phase  characteristics  requires  baud- 
sampled  statistics  of  at  least  fourth  order.  Disadvantages  of 
fourth-order  statistical  approaches  are  the  relatively  large 
sample  sizes  needed  for  good  estimates  of  the  statistics  and 
the  high  computational  complexity  of  the  associated  equal¬ 
ization  algorithms.  Here  we  examine  several  inherently 
asymmetric  constellations  with  symbols  placed  on  a  hexag¬ 
onal  lattice  rather  than  on  the  square  lattice  of  QAM-type 
constellations.  These  constellations  have  a  smaller  aver¬ 
age  power  under  minimum  symbol  separation  constraints 
than  several  widely-used  constellation.  We  show  how  to 
modify  the  constellations  to  provide  controllable  third-order 
properties,  and  demonstrate  blind  equalization  using  a  low- 
complexity  third-order  algorithm. 

1.  INTRODUCTION 

Most  constellations  used  in  digital  communications  are  sym¬ 
metric.  Well-known  symmetric  constellations  include  Al¬ 
ary  PAM,  PSK,  and  square  or  cross-shaped  QAM,  along 
with  a  variety  of  specialized  QAM-type  constellations  [1], 
[2]- 

A  constellation  S  from  which  a  random  symbol  sequence 
{x[n]}  is  generated  is  symmetric  when  there  is  at  least  one 
rotation  of  the  constellation  by  an  angle  0  6  (0, 27r)  that 
preserves  all  of  the  symbol  values  and  their  probabilities. 
An  immediate  difficulty  when  using  a  symmetric  constel¬ 
lation  is  that  it  is  not  possible  to  derive  a  blind  absolute 
reference  phase  at  the  receiver.  This  problem  is  ordinarily 
overcome  via  differential  encoding  of  the  data  at  the  trans¬ 
mitter.  A  second  problem  is  that  the  third-order  statistics 
of  linear  combinations  of  the  random  symbols  -  the  third- 
order  statistics  of  linear  channel  outputs  -  do  not  contain  in¬ 
formation  about  the  amplitude  and  phase  properties  of  the 
channel.  If  the  channel  is  known  a  priori  to  be  a  minimum- 
phase  channel,  or  to  be  a  maximum-phase  channel,  then 
second-order  statistics  suffice  to  equalize  inter-symbol  in¬ 
terference  (ISI).  If,  however,  such  knowledge  of  the  chan¬ 
nel  phase  is  absent,  then  statistics  of  at  least  fourth-order 


are  required  to  determine  the  channel  phase.  Fourth-order 
statistical  algorithms  are  undesirable  because  they  tend  to 
have  a  high  computational  complexity  and  because  accu¬ 
rate  estimates  of  fourth-order  statistics  may  require  large 
sample  sets. 

In  this  paper,  we  investigate  the  statistical  properties  of 
“optimum”  hexagonal  constellations  [3],  [4],  which  are  fairly 
old  in  concept  but  which  were  quickly  dropped  in  prac¬ 
tice  in  favor  of  the  symmetric  constellations  already  men¬ 
tioned.  It  turns  out  that  some  of  these  hexagonal  constel¬ 
lations  are  naturally  asymmetric.  This  asymmetry  allows 
extraction  of  channel  phase  (and  amplitude)  characteristics 
from  third-order  statistics.  Moreover,  the  symbol  separa¬ 
tion  of  these  hexagonal  constellations  is  somewhat  greater 
than  that  of  their  common  symmetric  counterparts  given  an 
average  transmitted  power  constraint.  In  a  sense,  then,  the 
asymmetry  is  “free”,  because  it  may  be  obtained  without 
sacrificing  immunity  to  additive  noise. 

On  the  design  front,  we  modify  the  hexagonal  constella¬ 
tions  to  increase  the  level  of  asymmetry  while  maintaining  a 
minimum  symbol  separation  and  staying  within  an  average 
transmitted  power  limit.  The  controlled  third-order  statis¬ 
tical  characteristics  of  the  constellations  demonstrate  that 
asymmetry  can  be  a  valuable  tool  in  creating  new  types  of 
symbol  constellations  that  are  resistant  to  noise,  ISI,  and 
other  distortions.  Finally,  we  demonstrate  blind  equaliza¬ 
tion  of  ISI-corrupted  channels  using  the  asymmetric  hexag¬ 
onal  constellations  and  a  third-order  algorithm. 

2.  SYSTEM  MODEL 

The  end-to-end  communications  system  we  consider  is  dig¬ 
ital,  but  models  both  analog  and  digital  effects.  The  trans¬ 
mitted  symbols  {®[n]}  from  the  constellation  S  form  an 
i.i.d.  random  sequence.  The  receiver  knows  the  values  and 
probabilities  of  the  symbols  in  S,  but  never  with  certainty 
any  x[n]. 

y[n]  =  hkx[n  -  k]  +  te[n]  (1) 

k=k\ 

The  distorted  channel  outputs  are  modeled  by  (1),  with  ISI 
represented  by  the  finite  set  of  time-invariant  channel  taps, 
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hk i , . . . ,  h-i,  hi, . . . ,  hk2  and  additive  white  Gaussian  noise 
w[n]. 

The  goal  of  the  receiver  is  to  estimate  the  transmitted 
symbol  .r[n]  for  each  n,  ideally  equal  to  x[n]  with  high  prob¬ 
ability.  If  the  channel  taps  are  known,  the  estimate  may  be 
generated  using  maximum-likelihood  sequence  estimation 
(MLSE),  a  feed-forward  linear  filter,  or  a  decision-feedback 
structure  [1].  Equalization  in  this  paper  will  take  the  form 
of  tap  identification. 

Vmix[n]  =  —  0.5e~-’^  x[n+l]  +  (1  +  0.45e~-’  ^)x[n] 

— 0.9e~J  3a:[n— 1]  +  ui[n]  (2) 

It  is  well  known  that  second-order  baud-sampled  statis¬ 
tics  of  channel  outputs  do  not  contain  separable  information 
about  the  phase  properties  of  the  channel.  The  channel 
in  (2)  is  a  mixed-phase  channel  with  zeros  at  2e~^  and 
0.9e-J  3.  The  frequency  response  appears  in  Fig.  1.  The 
amplitude  is  identical  to  that  of  a  minimum-phase  channel 
with  zeros  at  0.5eJ4  and  0.9e~3,  but  the  phase  response 
is  much  different.  An  MMSE  inverse  filter  for  one  of  the 
channels  will  restore  approximately  flat  amplitude  and  lin¬ 
ear  phase  to  that  channel,  but  will  fail  to  equalize  the  phase 
distortion  of  the  other.  The  challenge  for  blind  algorithms 
is  to  equalize  both  the  amplitude  and  phase. 


algorithms  may  offer  over  fourth-order  algorithms  include 
reduced  complexity  and  a  shorter  acquisition  time  for  the 
statistical  estimates.  Third-order  statistical  phase  informa¬ 
tion  is  not  present  in  symmetric  constellations,  so  we  turn 
instead  to  asymmetric  constellations. 

3.  HEXAGONAL  CONSTELLATIONS 

We  are  interested  in  M-ary  constellations,  where  M  =  2N 
for  some  integer  N  >  0.  N  data  bits  are  mapped  to  each 
successive  symbol,  a  convenient  feature  from  a  design  per¬ 
spective.  Classical  examples  of  such  constellations  that 
we  will  consider  are  8-PSK  and  16-QAM.  We  will  com¬ 
pare  these  to  asymmetric  “optimal”  hexagonal  constella¬ 
tions  having  8  and  16  symbols  [2],  For  purposes  of  compar¬ 
ison,  all  of  the  constellation  measurements  will  assume  unit 
minimum  symbol  spacing  unless  otherwise  specified. 

The  hexagonal  constellations  first  appeared  as  a  solution 
to  the  problem  of  maximizing  the  distance  between  symbols 
while  minimizing  the  average  transmitted  power  [3],  A  few 
symmetric  hexagonal  constellations  saw  deployment  in  in¬ 
dividual  products,  but  the  rest  remained  mostly  a  curiosity, 
unheralded  and  unwanted. 

3.1.  8-Point  Constellations 


Mixed-Phase  Channel 


Figure  1:  Frequency  Response  of  (2) 

Bussgang  filters  form  a  class  of  blind  equalization  al¬ 
gorithms  that  use  feed-forward  linear  filters  and  low- 
complexity  tap  update  equations.  However,  Bussgang  fil¬ 
ters  are  plagued  by  convergence  problems  of  ability  to  find 
and  speed  of  finding  good  filter  tap  values. 

An  approach  with  guaranteed  convergence  to  globally- 
optimum  parameter  sets  is  to  employ  algorithms  which 
use  higher  order  statistics  (HOS),  such  as  the  tricepstrum 
equalization  algorithm  (TEA)  [5].  HOS  algorithms  typi¬ 
cally  exploit  properties  of  fourth-order  cumulants.  They 
can  be  set  up  to  produce  inverse  filter  taps  or  estimates  of 
the  channel  taps.  However,  obtaining  reliable  estimates  of 
fourth-order  cumulants  may  require  a  large  set  of  channel 
outputs,  and  the  computational  complexity  of  HOS  algo¬ 
rithms  can  be  very  high. 

Third-order  statistics  are  a  viable  alternative  to  fourth- 
order  statistics,  provided  that  they  yield  the  same  informa¬ 
tion  about  the  channel  phase.  Advantages  that  third-order 


•  • 


•  • 


Figure  2:  8-HEX,  An  Asymmetric  Hexagonal  Constellation 

The  2-point  hexagonal  constellation  is  identical  to  the 
usual  binary  constellation.  The  4-point  hexagonal  constel¬ 
lation  has  a  characteristic  diamond  shape.  It  has  a  lower 
average  power  and  a  higher  peak  power  than  the  4-QAM 
(4-PSK)  constellation,  but  is  not  useful  for  our  purposes  be¬ 
cause  it  is  symmetric.  The  first  suitable  asymmetric  hexag¬ 
onal  constellation  has  8  symbol  points  as  shown  in  Fig.  2. 
We  refer  to  it  hereafter  as  8-HEX.  Note  that  for  any  partic¬ 
ular  orientation  of  the  constellation  in  the  complex  plane, 
there  is,  a  unique  set  of  symbol  values. 

A  key  third-order  statistic  is  73,  given  in  (3). 

73  =  E{\x[n]\2x[n]}  (3) 

The  value  of  this  moment  is  a  measure  of  how  much  asym¬ 
metry  a  constellation  exhibits.  For  the  8-HEX  constellation 
with  unit-spaced  symbols,  73  =  0.1015.  In  contrast,  for  8- 
PSK  -  a  symmetric  constellation  -  73  is  identically  zero. ' 

The  average  power  of  the  8-HEX  constellation,  PaVerage , 
is  1.0781.  The  average  power  of  8-PSK  with  a  minimum 
symbol  spacing  of  1  is  1.7071.  The  8-HEX  offers  a  2  dB 
improvement  over  the  average  power  of  8-PSK.  With  re¬ 
spect  to  blind  equalization,  the  important  imrovement  is 
the  non-zero  73. 

In  some  systems,  particularly  those  in  which  amplifiers 
have  a  limited  range  of  linearity,  the  peak  power,  Ppeak , 
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Constellation 

73 

Paverage 

Ppeak 

8-PSK 

8-HEX 

0 

0.1015 

1.7071 

1.0781 

1.7071 

2.2969 

16-QAM 

16-HEX 

0 

0.0938 

2.5 

2.1875 

4.5 

3.8125 

Table  1:  Constellation  Comparisons  Given  Unit  Minimum 
Symbol  Separation 

cannot  be  too  large.  Ppeak  of  the  8-HEX  constellation  is 
2.2969,  as  opposed  to  1.7071  for  8-PSK.  The  increased  Ppeak 
of  8-HEX  may  or  may  not  represent  a  desired  characteristic. 

3.2.  16-Point  Constellations 


The  16-point  optimal  hexagonal  constellation  (16-HEX) 
appears  in  Fig.  3.2.  It  is  currently  the  best  known  16-point 
configuration  for  maximizing  the  symbol  spacing  subject 
to  an  average  transmitted  power  limit  [2],  A  commonly- 
used  symmetric  constellation  having  16  points  is  16-QAM, 
with  the  symbols  occupying  points  on  a  four-by-four  square 
lattice. 

The  value  of  73  for  16-HEX  is  0.0938,  less  than  the  73 
of  8-HEX,  but  greater  than  that  of  16-QAM,  which  is  0. 
The  average  powers  of  the  two  16-point  constellations  are 
2.1875  for  16-HEX  and  2.5  for  16-QAM.  16-HEX  has  a  0.6 
dB  average  power  advantage  over  16-QAM,  an  approximate 
gain  which  is  retained  by  higher-order  2'v-ary  hexagonal 
constellations  over  fully-filled  square  2iv-ary  QAM  [4]. 

The  peak  power  of  16-QAM  is  4.5,  while  the  peak  power 
of  16-HEX  is  only  3.8125.  16-HEX  has  lower  Paverage  and 
Ppeak  than  16-QAM,  and  also  a  non-zero  73. 

Table  1  summarizes  the  properties  of  the  constellations 
we  have  discussed.  As  the  numbers  of  points  in  the  hexago¬ 
nal  constellations  increase,  the  relative  amount  of  asymme¬ 
try  as  measured  by  73  decreases,  while  Ppeak  and  Paverage 
are  less  than  the  corresponding  quantities  of  the  QAM 
constellations.  These  trends  are  a  result  of  the  sphere¬ 
packing  nature  of  the  hexagonal  constellations:  as  more 
lattice  points  are  available  inside  a  complex-plane  circle  of  a 
given  diameter,  it  becomes  easier  to  find  a  roughly-uniform, 
roughly-circulax  distribution. 

4.  MODIFIED  HEXAGONAL 
CONSTELLATIONS 

Hexagonal  constellations  we  have  not  presented  here  that 
are  of  interest  to  digital  communications  systems  designers 
include  those  with  M  =■  32,  64,  128,  256,  512,  and  1024 


points,  all  of  which  are  potential  replacements  for  high- 
order  M- ary  QAM  constellations  in  severely  bandwidth- 
limited  channels.  A  symmetric  64-point  optimum  hexago¬ 
nal  constellation  was  shown  in  [4]. 

For  both  the  present  optimum  hexagonal  constellations 
and  the  high-order  varieties,  there  is  room  to  increase  the 
value  of  73  and  thus  the  amount  of  asymmetry  contained  in 
the  constellation.  Having  a  larger  73  is  useful  because  the 
third-order  statistics  to  be  used  for  blind  equalization  will 
have  a  lower  relative  variance  and  the  blind  equalization 
algorithm  can  perform  well  more  quickly  than  when  73  is 
small. 

A  modified  hexagonal  constellation  with  the  same  min¬ 
imum  symbol  spacing  and  the  same  Paverage  as  the  corre¬ 
sponding  QAM  constellation  might  replace  the  QAM  con¬ 
stellation,  provided  that  other  specifications  such  as  a  max¬ 
imum  Ppeak  are  not  violated.  We  presently  investigate 
changes  to  the  optimum  hexagonal  constellations  that  in¬ 
volve  moving  one  or  two  symbol  points  in  the  direction  of 
the  73  value,  and  the  remaining  symbol  points  in  the  oppo¬ 
site  direction. 

4.1.  Modified  8-HEX 


•  • 


•  • 


Figure  3:  8-HEX-MA,  A  Modified  Asymmetric  Hexagonal 
Constellation 

If  we  add  1.0728  to  the  highest-power  symbol  in  the  8- 
HEX  constellation  and  restore  the  zero-DC  condition  by 
subtracting  1.0728/7  from  each  of  the  other  symbols,  we 
have  the  constellation  8-HEX-MA  shown  in  Fig.  3.  The 
average  power  of  this  constellation  is  1.7071,  identical  to 
that  of  8-PSK,  while  Ppeak  =  6.699  and  73  =  1.5668. 

The  reason  we  choose  this  scheme  of  moving  the  highest- 
power  symbol  (s)  in  the  direction  of  73  and  the  rest  in  the 
opposite  direction  is  that  73  is  a  power-weighted  average. 
We  try  to  greatly  increase  \x\2  x  (x  €  S)  for  one  of  the  sym¬ 
bols  while  allowing  \x\2x  for  the  others  to  remain  roughly 
the  same. 

4.2.  Modified  16-HEX 

In  Fig.  4,  we  present  a  modified  version  of  16-HEX  with 
increased  73.  The  two  highest-power  points  have  each 
been  moved  a  distance  of  0.5413  from  their  former  loca¬ 
tions,  while  the  other  14  points  were  moved  0.5413/7  in 
the  opposite  direction.  The  16-HEX-MA  constellation  has 
Paverage  =  2.5,  Ppeak  —  6,  and  73  0.7365. 

A  second  method  one  might  use  to  increase  the  73  value 
of  the  16-HEX  constellation  would  be  to  pick  one  of  the  two 
highest-power  symbols.  Move  it  in  the  direction  opposite 
that  of  a  73  statistic  excluding  the  selected  point  (i.e.  using 
the  other  15  symbols  each  with  probability  1/15). 
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Figure  4:  16-HEX-MA,  A  Modified  Asymmetric  Hexagonal 
Constellation 


Constellation 

73 

p 

1  average 

Ppeak 

8-PSK 

0 

1.7071 

1.7071 

8-HEX-MA 

1.0728 

1.7071 

6.6997 

16-QAM 

0 

2.5 

4.5 

16-HEX-MA 

0.5413 

2.5 

6 

Table  2:  Constellation  Comparisons  Given  Unit  Minimum 
Symbol  Separation  and  Fixed  PaVerage- 


In  reality,  the  receiver  does  not  know  the  length  of  the 
channel,  and  must  estimate  L  based  on  statistics  of  the 
channel  outputs.  These  estimated  statistics  -  which  may 
or  may  not  be  the  Fm  values  to  be  used  in  recovering  the 
channel  taps  -  and  the  Fm  estimates  should  converge  with 
relative  rapidity  to  the  desired  values.  For  Fm  estimates, 
the  quality  of  the  estimates  can  be  improved  by  having  a 
large  73  rather  than  a  small  one,  and  by  using  larger  channel 
output  sets. 

The  channel  for  each  of  the  simulations  is  (2),  with 
AWGN  having  variance  al  =  0.1.  For  each  of  the  asym¬ 
metric  constellations,  we  generated  100  random  sequences 
of  10000  symbols  each,  and  estimated  the  channel  taps  us¬ 
ing  Fm  =  9^5  l*/M|2y[n  +  m\  for  m  €  {0,  ±1}.  To 

clarify  the  performance  and  potential  performance  of  the 
third-order  approach  to  blind  equalization  for  which  we  ad¬ 
vocate  asymmetric  constellations,  we  plot  the  magnitude 
and  phase  of  all  the  resulting  channel  estimates. 

5.1.  8-HEX  and  8-HEX-MA 


The  statistical  properties  of  the  modified  hexagonal  con¬ 
stellations  are  shown  in  Table  2.  Quite  clearly,  there  is 
considerable  leeway  for  constructing  asymmetric  constella¬ 
tions  within  the  Peerage  and  minimum  symbol  separation 
constraints  imposed  on  typical  systems  where  QAM  con¬ 
stellations  are  in  use.  It  is  possible  that  this  freedom  may 
be  parlayed  into  an  “optimum”  constellation  having  a  mini¬ 
mum  Paverage  subject  to  minimum  symbol  separation,  min¬ 
imum  required  73,  and  maximum  Ppeak  constraints. 

5.  EXAMPLES  OF  BLIND  CHANNEL 
IDENTIFICATION 


8-HEX 


M«gnhod»  Baspcnaa  (dB) 


Phas«  (dogrsea) 


The  natural  asymmetry  of  some  of  the  optimum  hexagonal 
constellations  and  the  strengthened  asymmetry  of  the  mod¬ 
ified  constellations  are  quite  useful  in  blind  equalization  of 
linear  channels  modelled  by  (1).  Consider  the  channel  out¬ 
put  statistic  Fm  =  E{\y[n]\2y[n  +  m]},  which  is  related  to 
73  of  the  constellation  by  (4). 

*  2 

E{\y[n]\2y[n  +  m]}  =  73  ^  \hk\2hk+m  (4) 

k=k  1 

Since  the  channel  length  is  finite,  a  time-domain  method 
of  estimating  the  channel  taps  can  proceed  in  a  simple,  re¬ 
cursive  fashion.  Suppose  that  the  channel  length  is  L  = 
&2  —  hi  “b  1. 

Fl-  1  =  731/J.*!  |2/ifc2  (5) 

F-(l-i)  =  'T3\hk2\2hkl  (6) 

Equations  (5)  and  (6)  together  permit  estimation  of  the 
end-most  channel  taps  hkl  and  hk2. 

Fl- 2  =  +j3\hkl+i\2hk2  (7) 

F-(L- 2)  =  73|hj,2_i  +  73|hj,2|2/ifc1+1  (8) 

Subsequent  to  obtaining  hkl  and  hk2,  solving  (7)  and  (8) 
provides  estimates  of  hk^+i  and  hk2- 1.  The  remaining  taps 
may  all  be  computed  in  a  similar  fashion. 


Figure  5:  Channel  Estimates  Using  8-HEX 


Figure  6:  Channel  Estimates  Using  8-HEX-MA 

In  Figs.  5  and  6  we  present  the  simulation  results  when 
the  constellations  are  8-HEX  and  8-HEX-MA  respectively. 
We  can  see  in  the  first  figure  that  several  of  the  channel 
estimates  based  on  the  73-weighted  statistics  of  the  8-HEX 
constellation  had  fairly  large  amplitude  and  phase  errors. 
On  the  other  hand,  the  8-HEX-MA,  with  its  higher  value  of 
73,  provided  very  good  estimates  of  both  channel  magnitude 
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and  channel  phase  during  each  trial  run.  For  reference, 
the  phase  response  of  the  minimum-phase  channel  with  the 
same  amplitude  response  as  (2)  appears  as  a  dashed  line. 

Because  the  channel  estimates  for  the  10000-point  data 
set  using  8-HEX-MA  were  so  good,  we  tried  using  smaller 
blocks.  At  500  points,  the  algorithm  still  works  well,  while 
for  blocks  of  100  points  the  rate  of  poor  channel  estimates 
is  appreciable  -  25  to  30  out  of  100  trials.  However,  the  fact 
that  it  is  possible  to  obtain  reasonable  channel  estimates 
with  fairly  small  data  sets  is  a  good  sign  and  justifies  our 
consideration  of  asymmetry  in  constellation  design. 

5.2.  16-HEX  and  16-HEX-MA 


Figure  7:  Channel  Estimates  Using  16-HEX 


The  100  channel  estimates  for  the  16-HEX  and  16-HEX- 
MA  constellations  appear  in  Figs.  7  and  8  respectively. 
Though  there  are  many  poor  channel  estimates  for  16-HEX, 
when  the  added  asymmetry  is  included,  the  16-HEX-MA 
constellation  allows  the  channel  estimation  algorithm  to 
perform  quite  well. 

5.3.  Methods  for  Improving  Performance 

Ultimately,  a  blind  equalizer  does  not  operate  as  a  stan¬ 
dalone  system.  Rather,  it  is  part  of  a  larger  whole  which 
operates  for  a  time  using  blind  update  and  then  switches  to 
DD  update.  In  this  situation,  even  a  relatively  poor  channel 


estimate  may  provide  the  information  needed  to  initialize 
(roughly)  the  equalizer  parameter  values. 

Several  avenues  are  available  for  improving  the  perfor¬ 
mance  of  the  blind  channel  estimation.  One  we  have  dis¬ 
cussed  and  shown  is  increasing  73  for  a  given  constellation. 
Another  approach  is  to  allow  larger  numbers  of  channel 
outputs  to  be  used  in  the  statistical  estimates.  If  this  is 
undesirable,  then  it  might  be  possible  to  take  advantage  of 
other  statistics  than  those  we  chose.  For  instance,  second- 
order  statistics  contain  information  about  tap  amplitude. 
This  information  could  be  used  to  check  the  hk  values  pro¬ 
duced  by  our  third-order  algorithm  to  determine  whether  or 
not  the  estimated  filter  taps  have  appropriate  amplitudes. 
Other  third-order  statistics  contain  amplitude  and  phase 
information  which  might  be  used  in  conjunction  with  that 
of  the  Fm  statistics  used  in  this  paper. 

6.  CONCLUSION 

We  investigated  the  statistical  properties  of  several  vener¬ 
able  constellations  that  have  not  seen  much  use  in  actual 
communications  systems.  It  turns  out  that  some  of  these 
hexagonal  constellations  are  asymmetric,  so  that  informa¬ 
tion  about  the  phase  properties  of  the  channel  through 
which  symbols  from  the  constellation  are  transmitted  is 
contained  in  third-order  statistics  of  channel  outputs.  This 
enables  use  of  third-order  statistics  to  identify  and  equalize 
mixed-phase  channels,  in  lieu  of  the  fourth-order  statistics 
which  are  required  when  the  constellation  is  symmetric. 
This  capability  comes  at  low  cost,  as  the  hexagonal  con¬ 
stellations  have  greater  symbol  separation  than  the  popular 
symmetric  constellations  having  the  same  average  transmit¬ 
ted  power.  We  showed  a  simple  modification  of  the  hexago¬ 
nal  constellations  to  increase  their  asymmetry  while  retain¬ 
ing  minimum  symbol  separation  and  a  limit  on  the  average 
transmitted  power,  and  showed  channel  equalization  based 
on  third-order  statistics  and  the  asymmetric  constellations. 
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ABSTRACT 

The  aim  of  our  study  is  to  compare  two  a  priori  different 
approaches  :  Bilinearity  and  Cyclostationarity.  Indeed,  we 
underline  that  cyclostationary  and  bilinear  tools  make  it  possible 
to  determine  both  non  linear  and  non  stationary  links.  These 
results  are  based  on  calculations,  simulations  for  synthetic 
signals  and  finally  applications  to  industrial  signals.  Then,  we 
introduce  different  applications  of  these  approaches  to  industrial 
vibrations.  These  methods  enable  us  to  obtain  much  more 
interesting  results  than  classical  methods  such  as  Fourier, 
Spectrum,  time-frequency  studies...  Finally  we  conclude  on  the 
interest  and  the  reliability  of  such  approaches  and  we  introduce  a 
method  to  determine  if  a  link  between  two  frequencies  is  rather 
cyclostationary  than  bilinear.  This  method  is  based  on  the  use  of 
higher-order  cyclic  statistics. 

l.  INTRODUCTION 

The  omnipresence  of  gears  in  most  industrials  sectors,  as  well  as 
the  need  for  an  early  diagnosis  of  faults  and  a  quality  control  of 
noises,  has  made  the  study  of  vibrations  a  very  interesting  and 
exciting  subject  for  the  scientific  world.  Until  now,  vibration 
analysis  was  mainly  based  on  stationary  methods  such  as  spectral 
analysis,  Fourier  analysis,  cepstrum...  [1],  [2]  and  [3].  However, 
recent  studies  have  proved  that  new  methods  based  on  non 
stationary  and  non  linear  properties  of  vibrating  phenomena 
could  bring  results  of  the  highest  interest  [4],  [5]  [6]  [7], 

Our  study  first  deals  with  a  comparison  between 
cyclostationarity  and  bilinearity,  and  then  presents  different 
applications  of  these  approaches. 

In  a  first  time,  after  a  short  introduction  of  main  definitions  of 
these  approaches,  we  introduce  a  comparison  of  bispectrum  and 
spectral  correlation.  Indeed,  we  underlined  by  calculation  that 
cyclostationary  and  bilinear  tools  make  it  possible  to  determine 
both  non  linear  and  non  stationaiy  links  [8].  Then,  same  results 
were  presented  by  simulations  on  different  synthetics  signals  [9] 
[10].  In  this  article,  we  present  an  application  of  this  result  to 
industrial  signals,  recorded  on  an  helicopter  of  the  NAVY 
(Westland  Data). 

Moreover,  Cyclostationary  and  Bilinear  approaches  enabled  us 
to  obtain  much  more  interesting  results  than  classical  methods 
such  as  Fourier,  Spectrum  analysis,  time-frequency  studies... 
Indeed,  cyclic  analysis  allows  a  good  early  diagnosis  [11],  We 
will  present  an  application  of  the  spectral  correlation  to 


helicopter  gearboxes  vibrations.  We  will  introduce  the  influence 
of  torque  on  the  quality  of  the  diagnosis. 

Finally,  we  introduce  a  first  solution  to  determine  if  a  link  is 
more  bilinear  or  more  cyclostationaiy.  This  method  is  based  on 
Higher-Order  Cyclic  Statistics  ,  introduced  by  Gamer  [12]  [13], 
and  could  allow  to  determine  if  a  link  between  two  frequencies  is 
rather  more  cyclostationary  or  bilinear  and,  therefore,  to 
determine  if  vibrations  are  more  due  to  a  surface  fault  or  to  a 
profile  fault. 


2.  COMPARISON  OF  BILINEAR  AND 
CYCLOSTATIONARY  APPROACHES 


2.1  Main  definitions  and  properties 


Contrary  to  a  stationary  signal,  the  cross  correlation  Rx(t,x)  of 
which  is  only  a  function  of  t,  a  signal  is  said  to  be 
cyclostationary  of  second  order  when  its  cross  correlation 
depends  on  the  range  variable  t  but  is  also  periodically  time 
dependent.  The  double  Fourier  transform  of  the  cross  correlation 
provides  the  spectral  correlation  Sxa(f)  defined  by  : 


S*  (/)  =  FTt<x  K(/,t)]= 


(1) 


The  interpretation  of  results  brought  by  the  spectral  correlation 
will  be  an  interpretation  in  terms  of  statistically  linked 
frequencies.  Let’s  consider  the  random  signal 
x(t)  —  a{t).e2J*  f'!  -f  6(f). e2y,t'/3' with  aft)  and  bft)  two  random 
and  stationary  modulations.  The  expression  of  the  spectral 
correlation  of  this  signal,  the  calculations  of  which  are  presented 
in  [7],  led  to  the  following  conclusions: 

If  aft)  and  bft)  are  non  correlated,  the  only  non  nil  terms  of 
Rx(t;x)  are  time  independent.  So,  the  signal  is  stationary. 

On  the  other  hand,  if  aft)  and  bft)  are  statistically  linked,  the 
cross  correlation  becomes  time  dependent.  Moreover,  for  the 
spectral  frequency  (ft+fJ/2,  there  are  two  cyclic  frequencies,  f2-f, 
f°r  which  the  spectral  correlation  is  non  zero. 


To  conclude,  for  the  spectral  correlation,  the  appearance  of  a 
peak  for  the  couple  of  frequencies  {f\  a)  means  that  the  two 
frequency  components  f+a/2  and  f-a/2  are  statistically  linked 
[11]. 

As  the  n,h  order  stationarity  study  is  linked  with  n'h  order 
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moment,  the  study  of  linearity  is  linked  with  n"1  order  cumulants 
and  their  n  Fourier  transforms  :  n‘h  order  polyspectra.  We  will 
limit  our  study  to  second  order  non  linearity,  described  by  their 
Bispectrum : 

=  t1;t2K2'*(*,+^)  =FTIi,j[C,(t1;t2)  (2) 

—CO  —3D 

The  common  properties  of  cumulants  and  bispectrum  (linearity, 
symmetry,  invariance  by  phase  shifting)  are  precisely  introduced 
in  [14].  We  will  only  underline  the  main  property  of  bispectrum: 
Quadratic  phase  coupling  (QPC)  detection.  If  we  consider  two 
signals  x,  and  x2,  given  by: 

x,(f)  =  eu-A“"  +  e2*7’1**1  +  (3) 

with  /]=/]+/],  and  <p„  <p2,  <p3  random  and  independent  variables. 
The  study  of  the  PSD  will  only  present  the  three  frequency 
components  f,  f2  and  f3  without  any  information  concerning 
phases.  On  the  contrary,  bispectra  of  x,  and  x2  are  different :  The 
bispectrum  of  x,  will  be  identically  nil  whereas  the  bispectrum  of 
x2  will  present  a  peak  for  the  couple  of  frequencies  (/],/])  and  all 
its  symmetries.  To  conclude,  the  appearance  in  a  bispectrum  of  a 
peak  for  the  couple  of  frequencies  <ff,  f2)  will  underline  a  bilinear 
coupling  of  the  frequency  components  f,  and/]. 

Finally,  the  last  property  that  we  want  to  present  in  this  section 
concerns  the  detection  of  bilinear  links  and  cyclostationary  links 
by  both  bispectrum  and  spectral  correlation.  Indeed,  we  proved 
by  calculation  [9]  and  by  simulations  with  synthetic  signals  [8] 
that  bilinear  approach  and  cyclic  analysis  could  detect  both 
bilinear  and  cyclostationary  links.  This  result  is  quite  interesting. 
Indeed,  the  estimation  of  bispectra  is  very  long  and  must  be 
realized  for  all  the  frequency  domain.  On  the  contrary,  the 
calculation  of  the  spectral  correlation  can  be  done  for  a  single 
cyclic  frequency  and  is  very  fast  computation  ;  even  if  it  requires 
long  data.  To  conclude,  it  could  be  more  interesting  to  study 
certain  bilinear  phenomena  using  a  cyclostationary  approach 
rather  than  a  bilinear  one. 

2.2  Detection  of  bilinearity  and  Cyclostationarity 
by  Bispectrum  and  Spectral  correlation: 
Application  to  industrial  vibrations 

In  this  section,  we  get  interested  in  the  extension  of  the 
previously  enounced  property  to  industrial  signals,  recorded  on 
an  helicopter  (Westland  Data).  The  system  will  be  more 
precisely  presented  in  the  next  section.  As  we  can  see  on  figure 
1,  with  the  fault,  two  non  linear  links  appear  in  the  bispectrum. 
The  first  one  characterizes  a  modulation  phenomena  between 
two  meshing  frequencies  fml  and  fm2.  Indeed,  we  underlined  that 
when  a  fault  appears  in  our  system,  the  meshing  frequency 
modulated  fm2  [10].  This  phenomena  shows  a  bilinear  link 
between  considered  frequencies.  Moreover,  spectral  analysis 
underlined  the  appearance  of  a  frequency  f,  situated  exactly 
between  meshing  frequencies  fmt  and  fm2,  which  was  not 
characteristic  of  any  component  of  our  system  [15].  We  proved 
in  [9]  and  [10]  that  this  frequency  was  the  consequence  of  the 
link  between  meshing  phenomena.  The  appearance  of  this 
frequency  gives  rise  to  a  bilinear  link  between/]  and  fm2. 


Figure  1.  Bispectrum  of  Helicopter  vibrations  : 
Detection  of  bilinear  links. 


We  wanted  to  know  if  we  could  encounter  the  result  introduced 
in  the  section  2.1.  result  for  industrial  signals  again.  That  is  the 
reason  why  we  estimated  spectral  correlations  corresponding  to 
eventual  cyclostationary  links  between  (fm, ;  fm2)  and  (fm2 ;  f3). 
Figure  2  and  Figure  3  present  these  estimations.  As  is  shown  in 
figure  2,  during  the  research  of  eventual  links  between  fm2  and/], 
the  analysis  of  the  spectral  correlation  underlines  the  appearance 
of  two  peaks,  characterizing  the  existence  of  strong  links 
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Figure  2.  Spectral  correlation  for  a  =fm2-f3  -  f3  -  fml. 
Research  of  links  between  ( fm,  ;f}  )  and  (fM  ;f  ). 

These  two  peaks  characterize  strong  links  between  f}  and 
meshing  frequencies.  The  cyclic  analysis  therefore  allows  to 
detect  the  first  peak  of  the  bispectrum  presented  in  figure  1.  The 
second  calculation  concerns  the  research  of  eventual  link 
between  meshing  frequencies.  We  therefore  estimated  the 
spectral  correlation  for  a 
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Figure  3.  Spectral  correlation  for  a=fm2  -fml.  Research 
of  links  between  ( fnl ;  fm2). 
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We  can  note  that,  for  this  cyclic  frequency,  a  very  significant 
peak  appears  for  the  spectral  frequency  f  =  (fm2  -fml)  /  2.  Thus, 
according  to  the  property  of  interpretation  in  terms  of  statistically 
linked  frequencies  enounced  previously,  meshing  frequencies  are 
strongly  correlated. 

To  conclude,  this  study  confirm  theoretical  results  obtain  in  [8] 
and  [9] ;  i.e.  bispectrum  and  spectral  correlation  make  it  possible 
to  detect  and  to  determine  precisely  both  non  linear  and  non 
stationary  links.  This  result  raises  a  new  problem.  Indeed,  the 
detection  of  a  link  using  a  bilinear  approach  or  a  cyclic  analysis 
don’t  allow  us  to  come  to  a  conclusion  about  the  nature  of  this 
link.  However,  such  an  information  could  be  very  interesting  in 
terms  of  analysis  of  damage  phenomena.  Indeed,  we  know  that 
according  to  the  fault  we  are  facing  with  (surface  fault  or  profile 
fault),  vibrations  created  by  this  fault  will  be  rather  more 
cyclostationary  or  bilinear.  Thus,  if  detecting  a  link  between 
frequencies  (characterizing  the  apparition  of  a  fault)  we  could 
conclude  to  die  nature  of  this  link,  we  could,  in  the  mean  time, 
determine  if  we  are  facing  rather  a  spalling,  a  crack  or  a  pitting. .. 
This  wish  motivated  the  study  presented  in  section  4. 

3.  INFLUENCE  OF  TORQUE  ON  THE 
QUALITY  OF  THE  DIAGNOSIS 

3.1  Introduction  of  the  system 

In  this  section,  we  are  interested  in  the  study  of  helicopter 
gearbox  vibrations.  These  signals  come  from  a  measurement 
campaign  realized  by  Westland  Helicopter  LTD  on  a  NAVY 
equipment.  Figure  4  presents  a  general  view  of  our  system.  The 
principal  component  of  our  study  is  the  CH46  gear  box. 
Vibrations  were  recorded  for  eight  different  faults  and  eight 
different  torque.  For  this  paper,  we  will  only  study  one  spalling. 
Other  faults  have  already  been  presented  in  f8]  and  J9]. 


Figure  4.  Photo  of  the  helicopter  engine. 


The  application  of  spectral  correlation  we  want  to  present  in  this 
paper  concerns  the  diagnosis  of  faults  for  helicopter  vibrations. 
We  know  that  the  appearance  of  a  fault  for  rotating  machines  is 
characterized  by  modulation  phenomena  [11].  Indeed,  with  the 
fault,  rotating  frequencies  f  will  modulate  meshing  frequencies 
So  a  spectral  analysis  of  these  signals  will  underline  the 


existence  of  lateral  bands,  width  fn  around  each  harmonic  of  the 
meshing  [15].  As  described  in  [11],  the  appearance  in  the 
spectral  correlation  of  a  strong  link  between  the  meshing 
frequency  and  its  lateral  bands  will  underline  the  existence  of  a 
fault  on  the  component  characterized  by  these  frequencies.  The 
next  study  will  be  based  on  this  property.  The  Figure  5  presents  a 
simplified  idea  of  our  system. 
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Figure  5.  Simplified  schema  of  our  system. 

We  presented  in  [8],  [9]  and  [10]  that  cyclic  analysis  offered 
very  good  result  for  early  diagnosis  of  industrial  systems  (for 
simple  systems  as  well  as  for  complex  systems).  In  the  next 
section,  we  will  present  the  influence  of  torque  on  the  quality  of 
this  diagnosis. 

3.2  Influence  of  torque  for  diagnosis 

The  aim  of  the  measurement  campaign  is  to  define  different 
diagnostic  processes,  based  on  usual  signal  processing  methods. 
The  knowledge  of  the  influence  of  the  torque  to  diagnose  faults 
could,  therefore,  allow  us  to  contemplate  an  analysis,  realized 
just  before  the  helicopter  takes  off.  So,  inclining  correctly 
paddles,  it  could  be  possible  to  realize  a  good  quality  diagnosis 
just  before  the  use  of  the  aircraft. 

We,  therefore,  research  links  between  meshing  frequency/^,  and 
its  lateral  bands  to  ft.  The  figure  6  introduces  the  influence  of 
the  torque  on  the  quality  of  the  diagnosis.  It  represents  the 
evolution  of  the  characteristic  peak  corresponding  to  the  link 
between  the  meshing  frequency^  and  its  lateral  bands^+  fr  for 
the  different  levels  of  torque. 

First,  it  can  be  interesting  to  note  that,  for  small  faults,  we  will 
obtain  a  better  diagnosis  for  the  highest  torque.  This  can  be 
explained  by  the  fact  that  the  spalling  of  a  tooth  is  the 
consequence  of  a  weakness  of  the  latter.  So,  the  higher  the  torque 
will  be,  the  stronger  surface  pressure  will  be.  Thereby,  the  least 
change  of  surface  will  ‘resound’  stronger  if  the  torque  is  high 
[16]. 

On  the  contrary,  for  an  established  fault,  the  influence  of  torque 
is  completely  inverted.  The  diagnosis  will,  therefore,  be  easier 
for  a  low  level  torque.  Indeed,  the  previous  small  fault  is  now  a 
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significant  spalling.  The  fault  becomes  deeper  and  the  meshing 
phenomenon  appears  as  a  shock.  So,  the  higher  the  torque  will 
be,  the  more  the  meshing  teeth  will  be  inclined  to  fit  exactly  the 
shape  of  the  spalling  ;  decreasing,  therefore,  its  repercussions  in 
the  vibrating  signal  [16]. 


Figure  6.  Spectral  correlation  :  Evolution,  with  the 
torque,  of  the  link  between/;,,  and  frl. 

Thus,  it  seems  to  be  preferable  to  use  low  levels  of  torque  to 
diagnose  a  relatively  important  fault.  All  these  results  were 
confirmed  by  a  similar  study  on  a  crack  propagation  [8]. 

4.  CLASSIFICATION  METHOD  BASED 
ON  HIGHER  ORDER  CYCLIC 
STATISTICS 

As  we  note  in  section  2.2,  bispectrum  and  spectral  correlation 
detect  both  bilinear  and  cyclostationary  links  and  thus,  do  not 
allow  to  determine  the  nature  of  such  links.  We,  therefore, 
introduce  Higher  Order  Cyclic  Statistics  as  a  possible  solution  of 
our  problem  [15]  [16].  We  will  limit  our  study  to  second  and 
third  order  cyclic  statistics.  Indeed,  we  proved  that  the  use  of 
cyclic  bispectrum  makes  it  possible  to  determine  if  a  link 
between  two  frequency  components  is  cyclostationary  or 
bilinear.  We  will  not  present  calculations  of  cyclic  bispectrum  in 
this  paper.  Indeed  they  are  very  long  and  will  be  completely 
introduced  in  a  future  paper  in  Mechanical  Systems  and  Signal 
Processing,  [18]  and  in  the  final  report  of  my  PhD. 

The  idea  was  to  calculate  cyclic  bispectra  of  cyclostationary  and 
bilinear  synthetic  signal  x(t)  and  x2(t),  described  in  section  2.1. 
Then  we  researched  criteria  allowing  to  distinguish  bilinear  link 
from  cyclostationary  correlations. 

Figures  7  presents  the  complete  cyclic  bispectrum  of  the 
cyclostationary  signal  x(t).  We  can  note  that,  if  amplitude 
modulations  aft)  and  b ft)  are  independent,  the  stationary  version 
of  x(t)  is  characterized  by  two  peaks  situated  for  the  triplets  of 
frequencies  (a ;  X, ;  X2)  =  If,;  f, ;  f  )  and  (f2 ;  f;  f2 ).  On  the 
contrary,  if  aft)  and  bft)  are  statistically  linked,  we  can  observe 


that,  for  cyclic  frequencies  a =f,  et  a =f2,  additional  peaks  appears 
for  (f,;  f2  )  et  (f2;  f).  Moreover,  two  other  peaks  also 
characterized  this  cyclostationary  link  :  (lf2  -fi'.fiifi)  and  (2/  - 

f2;f\f,)- 


Figure  7.  Cyclic  Bispectrum  of  the  more  or  less 
cyclostationary  signal  xft). 


In  a  second  time,  the  calculation  of  the  theoretical  cyclic 
bispectrum  of  the  bilinear  signal  y(t)  raises  following  cyclic 
frequencies  [17] : 

•  For  a  =f:  Peaks  appear  for  (k,;  k2)=  (f, ;  f  ),  (f2;f  ) 

and  but  also  for  (/];/,)  and  (/, ;  f3  ). 

•  For  a  =  f2:  We  encounter  peaks  for  (f2;f2  ),  (f2;  f,  )  and 
(/■,;/,)  again  ;  but  also  for  (f3  ;f2 )  and  (f2  ;f3  ). 

•  For  a  =f3:  Peaks  appear  for  (f3;f3  ),  (f3;f  )  and  (f;f3) 
as  also  for  (f3;f2  )  and  (f2;f3 ). 

•  For  a  =  2fj :  Two  peaks  appear  for  the  couple  of 
frequencies  (f3;ft)  and  (f,  ;f3  ). 

•  For  a=  2f 2 :  Characteristic  peaks  of  this  frequency  are 
situated  in  (f2  ;f3  )  and  (f3  ;f2  ). 

•  For  a  =  2f2  -/:  A  peak  appears  for  (f2;f2). 

•  Fora  =2f  -f2:  A  peak  appears  for 

•  For  a  =  2f3-f‘.  Now,  a  peak  appears  for  (f3;f3 ). 

•  For  a  =  2f}  -  f2 :  The  same  peak  appears  for  if3 ;  f3 ). 

•  For  o.=f2-f,:  We  encounter  a  peak  for  (f2;f2 )  again. 

•  For  <x=f-f2:  The  cyclic  bispectrum  is  characterized  by  a 
peak  for  frequencies  (/];/}). 

These  results  allow  us  to  set  up  a  classification  process  to 
determine  if  a  link  between  two  frequencies  f,  and  f2  is  rather 
more  cyclostationary  or  bilinear.  The  following  figure  introduces 
the  diagram  of  this  method. 

Nevertheless,  the  estimation  of  HOCS  [13]  still  raises  problems 
if  we  want  to  deal  with  industrial  long  data.  This  estimation  can 
be  based  on  a  generalization  of  the  estimation  of  Higher  Order 
Statistics  to  non  stationary  processes.  These  methods  commonly 
use  periodograms.  We  are  still  working  on  this  problem  and  we 
will  present  an  application  of  the  cyclic  bispectrum  to  previously 
used  helicopter  signals  in  a  future  paper. 
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Spectra]  Correlation 
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Figure  8.  Cyclostationary  or  Bilinear  classification 
method,  based  on  the  use  of  HOCS. 


5.  CONCLUSION 

In  this  paper,  we  presented  a  comparison  of  bilinear  approach 
and  cyclostationaiy  analysis.  It  appears  that  these  approaches  are 
closely  linked.  Indeed,  bispectrum  and  spectral  correlation  make 
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cyclostationary  links. 
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diagnose  a  fault  on  a  system.  Indeed,  the  estimation  of  the 
spectral  correlation  is  a  lot  faster  than  bispectrum  computation 
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However,  for  the  moment,  the  problem  of  the  estimation  of 
HOCS  for  long  industrial  data  is  not  completely  solved  and  will 
be  dealt  with  in  a  future  study. 
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ABSTRACT 

The  spatial  processing  of  platform-mounted  acous¬ 
tic  sensors  is  complicated  by  platform  generated  noise. 
The  Weighted  Fourier  Integral  Method  (WFIM)  beam- 
former  has  been  shown  to  perform  well  in  such  cases, 
by  reducing  this  coloured  noise  which  is  received  by 
the  sensors.  In  this  paper  WFIM  is  modified  by  using 
maximum  likelihood  estimates  of  the  spatial  correla¬ 
tion  lags.  The  proposed  technique  exploits  the  work  of 
Burg  et.  al.  and  estimates  these  lags  for  a  sparse  redun¬ 
dant  linear  array  of  hydrophones.  The  results  obtained 
illustrate  the  significant  performance  improvement  ob¬ 
tainable  over  that  of  the  least-squares  lag  estimation 
procedure  utilised  in  WFIM.  The  proposed  approach 
“better”  estimates  the  contributions  of  missing  sensors, 
in  the  sparse  array,  and  performance  approaching  the 
full  array,  with  extra  sensors,  is  attained.  A  beam- 
former  which  adaptively  weights  the  covariance  lags  is 
also  proposed  and  preliminary  results  presented. 

1.  INTRODUCTION 

The  authors  have  been  studying  sonar  beamforming 
of  platform-mounted  acoustic  sensors;  a  problem  that 
is  complicated  by  platform  generated  noise  which  ad¬ 
versely  effects  the  detection  of  sources  (or  “contacts”) 
in  a  background  of  ambient  noise.  The  authors  have 
used  a  sparse  linear  array  and  have  both  analysed  real 
data  and  conducted  real-time  at  sea  testing,  to  evaluate 
the  performance  of  beamformers  [1,  2].  Due  to  contin¬ 
uing  advances,  multi-processor  digital  systems  are  now 
capable  of  performing  sophisticated  signal  processing 
in  real-time  [2];  hence  more  advanced  techniques  may 
now  be  considered  for  real-time  applications. 

The  Fourier  Integral  Method  (FIM)  beamformer, 
developed  by  Wilson  and  Nuttall  [3,  4]  after  a  num¬ 
ber  of  experiments  at  sea,  was  shown  to  outperform 
the  conventional  beamformer  and  was  somewhat  better 
than  the  Minimum  Variance  Distortionless  Response 
(MVDR)  beamformer.  The  authors  have  studied  the 


performance  of  FIM  and  a  variant  of  it,  called  Weighted 
FIM  (WFIM),  which  were  suitably  modified  for  sparse 
arrays.  The  results  obtained  with  a  platform-mounted 
array  (see  [1,  2])  showed  that  WFIM  outperformed 
both  FIM  and  MVDR.  WFIM  was  able  to  significantly 
reduce  the  platform  generated  noise,  which  was  un¬ 
rejected  by  MVDR  and  the  conventional  beamformer 
(FIM  did  somewhat  reduce  this  noise). 

The  Bartlett  (or  conventional)  beamformer  and  the 
MVDR  beamformer  apply  weights  directly  to  the  sen¬ 
sors;  weighting  is  thus  performed  in  the  sensor  domain. 
The  WFIM  beamformer  applies  weights  in  the  spatial 
correlation  lag  domain.  In  the  general  case  lag  weight¬ 
ing  can  achieve  everything  that  may  be  achieved  in 
the  sensor  domain,  and  in  addition  can  achieve  more. 
Thus  WFIM  is  capable  of  providing  improvements  over 
beamformers  operating  in  the  sensor  domain. 

The  performance  gains  obtained  by  WFIM  are  due 
to  two  factors.  Firstly,  the  lag  weighting  performed 
by  WFIM  reduces  the  contributions  of  small  lags;  this 
is  where  a  significant  amount  of  the  noise  (platform 
noise  and  also  ambient  noise)  has  been  found  to  be 
present.  By  reducing  the  contributions  of  these  lags, 
to  the  beamformer  output,  the  adverse  effects  of  noise 
are  reduced  and  hence  good  performance  is  achieved  by 
WFIM.  The  weighting  used  by  WFIM  also  results  in  a 
narrow  main  beam  (narrower  than  MVDR  in  practice) 
and  reduced  sidelobe  levels. 

The  second  reason  for  the  improvements  obtained 
by  WFIM  is  the  technique  used  for  estimating  the  cor¬ 
relation  lags.  Using  a  least-squares  approach,  WFIM 
essentially  compensates  for  the  missing  sensors  in  the 
sparse  array.  Hence  performance  similar  to  an  array 
with  extra  sensors  is  achievable.  In  this  paper,  further 
improvements  are  considered  which  result  in  further 
reduction  of  energy  leakage  in  sidelobe  directions. 

It  should  be  noted  that  the  power  output  of  WFIM 
can  go  negative.  As  a  result  strong  contacts  can  poten¬ 
tially  cause  the  masking  of  weak  contacts.  In  practice 
however  the  occurrence  of  this  is  not  common,  and  this 
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drawback  is  out  weighted  by  the  improvements  pro¬ 
vided  by  WFIM  for  sonar  beamforming  applications. 
This  issue  is  discussed  later,  in  this  paper,  when  adap¬ 
tive  weighting  in  the  lag  domain  is  considered. 

2.  ESTIMATION  OF  CORRELATION  LAGS 

Array  signal  processing  of  acoustic  sources  usually  in¬ 
volves  the  computation  of  second  order  statistics  of 
band-limited  hydrophone  data  (see  [1]).  The  second 
order  statistics,  for  a  particular  frequency,  are  repre¬ 
sented  by  the  spatial  covariance  matrix 

«o,o  «o,i  •  ■  •  So,M-l 

sl,0  «1,1  •••  SiM-l 

S  =  ;  ;  ;  (1) 

SAf-1,0  SM- 1,1  •••  «M-1,M-I 

where  spatial  covariance  lag  smi,m2  =  repre¬ 

sents  the  cross-correlation  between  sensors  mi  and  m2. 

The  power  output  of  the  WFIM  beamformer,  as  a 
function  of  bearing  6  and  frequency  /,  is  as  follows  [1] 

1  K~1 

P(Q,  f)  =  2 1  £  wk  Pk(f)vk{6,f)  (2) 

where  K  is  the  number  of  lags  present,  wk  are  the  lag 
weights,  pk(f)  are  the  spatial  correlation  lags  (at  fre¬ 
quency  /)  and  vk(9,  f )  is  related  to  the  array  steering 
vector.  The  approach  used  in  WFIM  to  estimate  pk 
(at  a  particular  frequency)  from  S,  for  a  sparse  linear 
array,  is  to  employ  a  least-squares  technique  [1].  The 
array  used  by  the  authors  was  a  redundant  array,  so  the 
co-array  is  full  and  estimates  of  all  correlations  lags  up 
to  lag  ( K  —  1)  are  available. 

The  least-squares  formulation  used  is  to  estimate  a 
Hermitian  covariance  matrix  which  contains  the  nec¬ 
essary  correlations  lags  [5,  6].  Note  for  a  wide-sense 
spatially  stationary  process  the  covariance  matrix  is 
Toeplitz  if  the  array  is  equispaced  and  the  signals  are 
uncorrelated;  this  has  been  exploited  in  [5,  6,  7]  to 
obtain  improved  spatial  processing.  The  least-squares 
technique  provides  an  analytic  solution  which  is  close  to 
the  true  covariance  matrix  in  a  minimum  norm  sense, 
but  its  optimality  in  practice  for  beamforming  appli¬ 
cations  is  questionable.  In  this  paper  an  alternative 
approach  is  considered,  which  is  to  estimate  the  cor¬ 
relations  such  that  the  likelihood  is  maximised.  The 
approach  exploits  the  work  by  Burg  et.  al.  in  [8]. 

Burg  et.  al.  considered  the  problem  of  spectral 
estimation  given  a  uniformly  sampled  real  time  series. 
They  considered  the  estimation  of  a  covariance  matrix 
of  specified  structure  (usually  Toeplitz),  and  did  this 


by  maximising  the  likelihood  function.  Here  the  array 
processing  problem  is  considered  and,  in  particular,  for 
sparse  linear  arrays  with  complex  (Hermitian)  covari¬ 
ance  matrices.  The  approach  by  Burg  et.  al.  can  be 
modified  for  this  problem  and  is  now  detailed. 

The  likelihood  function  can  be  expressed  as 

l{p)  =  -  ln(det(R))  -  frJR^S}  (3) 

where  R  is  a  structured  covariance  matrix  which  is  re¬ 
lated  to  p  —  [p0,pi,  ■  •  •  fpK-i]T  which  are  to  be  esti¬ 
mated  (note  p-k  =  p*k  and  pk  s  are  for  frequency  /). 

To  find  the  maximum  of  (3),  one  needs  to  differen¬ 
tiate  l(p)  with  respect  to  the  correlations  p  and  set  the 
resulting  expression  to  zero.  It  can  be  shown  [8]  that 


which  is  set  to  equal  zero  for  obtaining  the  maximum. 

Burg  et.  al.  realised  that  not  only  does  d'R./dp  sat¬ 
isfy  equation  (4)  but  also  matrix  Y  which  is  in  the  set 
of  covariance  matrices  to  be  estimated,  and  so  given  a 
solution  to  the  above  equation  R  (say  an  initial  esti¬ 
mate)  one  needs  to  solve  the  following  equation  for  R' 
which  is  the  new  solution  : 

tr{(R_1SR_1  -  R_1R'R_1)Y}  =  0  (5) 

or  alternatively  as 

tr{  R^SR^Y}  =  frJR-'R'R^Y}  (6) 

Now  one  must  write  the  matrix  R'  in  terms  of  the  un¬ 
known  correlations.  The  following  expansion  is  em¬ 
ployed  for  the  problem  considered  here, 

K—l 

K'  =  p0l+y£(prkXl  +  piXl)  (7) 

&=i 

where  prk  =  and  p\  =  %{pk}  ( p0  is  real),  the 

matrices  Xjj.  and  X£  are  sparse  matrices  which  con¬ 
tain  either  {1,  +j,  -j,  0}  in  locations  corresponding  to 
pk  ( j  is  the  complex  operator).  With  this  expansion, 
equation  (6)  becomes 

#r{R_1SR_1Y}  =  p0fr{R-1R_1Y}  + 

K- 1 

T,PrM  R-1xiR-1Y}  + 

fe=l 

K—l 

£  ^{R-'XjR-'Y}  (8) 

fc=i 

which  one  can  write  in  a  similar  form  to  [8],  by  noting 
Y  e  {I,Xi,...,Xjt_1,X?,-..,X^_1}  and  defining 

Cfc=tr{R-1SR“1Y}  (9) 
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ak  =  [tr{R_1R_1Y},  tr{R-1XjR_1Y},  •  •  • , 

fr{R-1XK_1R_1Y},  triR-'XlR-1 Y},  •  •  • , 
ir{R_1XK_1R_1Y}]T  (10) 

and  thus  obtain  a  system  of  equations  Ax  =  c  as  in  [8], 
where  the  (2 K  -  l)x(2 K  -  1)  symmetric  real  matrix 
A  =  [a1,a2,---.a2K-i],  the  (2 K  -  1)  element  real 
vector  c  =  [c*],  and  the  (2 K  -  1)  element  real  vector 

x  =  [po ,  Pi ,  •  •  • ,  Pk-i  .  Pi » •  •  •  >  Pk-i  >  F-  Since  some  of 
the  matrices  above  are  sparse,  direct  application  of  the 
above  equations  make  the  algorithm  computationally 
inefficient.  By  using  the  following,  in  equation  (9)  and 
(10),  computation  time  is  significantly  reduced 


x£  =  E  (El,n  +  E„,l) 

(11) 

(l,n)cUk 

Xl  =  j  Y  (El,n-En,l) 

(12) 

where  0*  is  the  set  of  matrix  indices  corresponding  to 
lag  At  (>  0),  Ei,n  =  eienT  (ej  is  the  unit  vector  with  all 


A  = 


=  [C?,  Ch¬ 

.  .  rl  ^,2 

!car-ncn 

ao,  0 
01,0 

Oo,l 

<4:1  - 

a/c-1,0 

ok,o 

fll.l* 
aK- 1,1 

2,1 

al,l 

_  a,2K- 2,0 

2,1 

“K-1,1 

Oo  ,K 

1,2 

°i!i 

Oo,2K-2 

1,2 

°l,Jf-l 

1,2 

aK- 1,1  ’ 

2,2 

0M 

1,2 

"  aK-l,K-l 
2,2 

al,K-l 

2,2 

aK- 1,1  ‘ 

'■  ■  a2'2 

aK-l,K-l 

■  i  °K- lJ 

dO,K-l 
1,1 

1,K—1 


and 


a 


i,i 

aK-l,K-l 

2,1 

al,K-l 


2,1 

aK-l,K-l 


(13) 


where  a;,m  are  elements  of  A  as  given  by  (10),  and 


c^triR-'SR-1}  (14) 

C\  =  2  Y,  ^{(R_1SR_1)n,l}  (15) 

(/,n)eQfc 

c|  =  2  Y  ^{(R^SR-1)^}  (16) 

(l,n)eQ  k 

<4:1=2  Y  E  ^{(R_1)n,/'(R--1)nM 

(!,n)e Qk  {V ,n')eQy 

+(R-‘)„,„.(R-1),.,,}  (17) 


4I*=2  E  E  sfIR-'V.KR-1).,,. 

-(R-1)n',n(R-1)(,r}  (18) 

4:»=2  E  E  OKR-'lni'IR-1).',! 

(l,n)tQk  (i',n')eftfc/ 

—  (R— 1)n,n'(R_ 1)<',/}  (19) 

ak%  = -2  Y  E  ^{(R'^n.l^R-1)^,! 

(l,n)eQk  (l',n')eUhi 

—  (R— 1)„,n' (R_ 1)l',l}  (20) 

Note  <44,  =  44,  equations  (18)  is  for  k'  >  k  and 
equation  (19)  is  for  k'  <  k. 

The  iterative  procedure  starts  by  using  R  =  I, 
which  is  a  positive  definite  matrix.  Matrix  A  and 
vector  c  are  then  calculated  using  equations  (14)-(20), 
and  then  vector  x  is  estimated.  From  the  correlations 
lags,  obtained  from  x,  the  covariance  matrix  R'  is  con¬ 
structed  using  equation  (7)  and  then  one  must  check 
that  the  covariance  matrix  is  both  positive  definite  (i.e. 
all  the  eigenvalues  are  positive)  and  that  it  increases 
the  likelihood  (equation  (3));  if  not,  try  gR'  +  (1  -  q)R 
where  q  =  0.5.  Continue  halving  q  till  the  above  con¬ 
ditions  are  met.  Then  set  the  new  solution  to  R  and 
continue  iterating  until  the  likelihood  does  not  increase 
much.  Note  the  inverse  of  R  may  be  determined  by  ex¬ 
ploiting  the  inherent  structure  present. 

3.  PERFORMANCE  COMPARISON 

The  performance  of  the  WFIM  beamformer,  using  the 
correlation  lags  estimated  above,  is  now  considered  by 
using  them  in  equation  (2).  Figure  1  shows  the  power 
output  of  the  Bartlett  beamformer,  the  WFIM  beam- 
former  and  the  proposed  WFIM-ML  beamformer;  the 
sparse  array  has  half  the  sensors  missing  and  the  data 
had  a  strong  contact  present  at  most  frequencies.  The 
gray  scales,  which  have  the  same  dB  range  in  each  sub¬ 
figure,  have  sufficiently  large  range  to  show  the  full  ex¬ 
tent  of  the  improvements  obtained  using  WFIM-ML. 
As  can  be  seen  WFIM  reduces  the  platform  generated 
noise,  which  is  present  in  the  conventional  beamformer 
at  low  frequencies,  and  thus  enhances  the  detection  of 
contacts  from  the  background.  WFIM-ML  is  seen  to 
similarly  reduce  the  noise,  but  in  addition  is  able  to 
further  reduce  the  leakage  of  the  contact’s  energy  in 
the  sidelobe  directions  and  was  found  to  provide  up  to 
10  dB  improvement  over  WFIM. 

The  performance  of  WFIM-ML  over  WFIM  are  due 
to  the  improved  estimation  of  the  correlation  lags.  It 
was  found  that  performance  approaching  that  of  a  uni¬ 
form  linear  array,  with  extra  sensors,  was  achieved  i.e. 
the  results  attained  are  similar  to  that  theoretically 
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(a)  Bartlett  Beamformer 


Frequency 


(b)  WFIM  Beamformer 


Bearing 

llMfm  ft  Jiff 

Frequency 

(c)  WFIM-ML  Beamformer 

Bearing 

mm"  ■ 

Frequency 

Figure  1:  Beamformer  Comparison. 


achievable  using  a  full  array.  Using  the  sparse  array  the 
spatial  stationarity  of  the  process  has  been  exploited; 
reduced  spatial  sampling  has  been  replaced  by  equiva¬ 
lent  temporal  sampling.  It  should  be  noted  that  in  low- 
noise  simulations  the  least-squares  approach  was  found 
to  give  similar  results  to  the  new  approach.  The  max¬ 
imum  likelihood  correlation  lag  estimates  may  also  be 
used  in  an  augmented  covariance  matrix  [5]  to  enhance 
the  Bartlett  beamformer;  this  is  however  not  possible 
using  a  FFT  based  beamformer. 

Figure  2  illustrates  the  performance  of  WFIM-ML 
when  even  more  sensors  are  excluded,  while  still  main¬ 
taining  a  full  co-array.  Figure  2(a)  and  2(b)  show  the 
Bartlett  beamformer  and  the  WFIM-ML  beamformer 
respectively,  for  the  original  sparse  array;  a  strong  con¬ 
tact  is  seen  in  both  sub-figures,  while  WFIM-ML  shows 
two  additional  weaker  contacts.  Figure  2(c),  which 
was  obtained  when  several  more  sensors  were  excluded, 
shows  that  the  performance  of  WFIM-ML  is  compara¬ 
ble  here  to  the  original  sparse  array  with  some  differ¬ 
ences  however  observed.  Figure  2(d)  shows  the  same 
case  as  in  2(c)  but  here  the  number  of  snapshots,  used 
to  estimate  the  covariance  matrix,  has  been  doubled. 


(a)  Bartlett :  Original  Sparse  Array 


(b)  WFIM-ML ;  Original  Sparse  Array 


<D 

i 

o 

CL 


(d)  WFIM-ML  :  Less  Sensors  and  More  Snapshots 


Figure  2:  Performance  with  even  less  sensors. 


The  differences  are  seen  to  be  reduced  and  indicates 
that  the  more  snapshots  obtained,  from  a  stationary 
process,  results  in  better  performance.  The  algorithm 
was  found  to  always  converge  and  each  iteration  did 
provide  observable  improvements. 

4.  ADAPTIVE  FIM 

The  WFIM-ML  results  show  the  improvements  that 
may  be  obtained  by  “better”  combining  the  covariance 
lag  contributions.  Now  a  procedure  developed  for  adap¬ 
tively  weighting  the  covariance  lags  is  discussed.  The 
algorithm  minimises  the  power  output  of  the  beam- 
former  while  ensuring  (a)  the  power  output  in  the  di¬ 
rection  of  interest  is  unity,  and  (b)  the  power  output  is 


710 


(a)  Bartlett  Beamformer 


Frequency 

(b)  Adaptive  FIM  Beamformer 


Frequency 


Figure  3:  Adaptive  lag-domain  Beamformer. 

alway  positive.  When  the  covariance  lags  are  weighted 
the  power  output  is  real  provided  Hermitian  weights  are 
used;  the  power  output  is  however  not  guaranteed  to 
be  positive.  Positive  power  output,  or  more  generally 
beampattern  (filter  response)  positivity,  is  very  impor¬ 
tant  for  adaptive  covariance  lag  weighting,  since  other¬ 
wise  the  adaptivity  can  result  in  the  nulling  of  contacts. 
Thus  the  adaptive  algorithm  is  constrained  such  that 
the  power  output  is  always  positive,  by  formulating  the 
problem  in  a  semidefinite  optimisation  framework  [9]. 
The  results  obtained,  for  one  such  adaptive  algorithm, 
is  shown  in  figure  3.  This  beamformer  is  seen  to  ex¬ 
tract  several  weak  contacts  which  are  not  apparent  in 
the  Bartlett  beamformer.  Modifications,  such  as  the 
controlled  partial  relaxation  of  constraint  (b),  are  cur¬ 
rently  being  investigated. 

5.  CONCLUSION 

The  WFIM  beamformer,  which  has  been  previously 
shown  to  improve  spatial  processing  of  acoustic  sources, 
has  been  further  improved.  A  maximum  likelihood 
procedure  was  used  to  estimate  the  correlation  lags, 
which  was  shown  to  provide  superior  performance  to 
the  lag  estimation  procedure  used  in  WFIM.  The  new 
WFIM-ML  beamformer,  like  WFIM,  performs  well  in 
the  presence  of  platform  generated  noise,  but  WFIM- 
ML  is  seen  to  better  detect  contacts  in  the  presence  of 
a  strong  contact.  The  WFIM-ML  beamformer’s  per¬ 
formance,  with  the  sparse  array  considered,  was  seen 
to  approach  that  attainable  with  a  full  array  with  ex¬ 


tra  sensors.  An  adaptive  lag  beamformer  was  also  dis¬ 
cussed,  and  it  was  shown  to  provide  very  promising 
early  results.  Further  research  is  currently  being  con¬ 
ducted  in  this  area  and  extensive  testing  of  this  adap¬ 
tive  beamformer  is  being  performed. 
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ABSTRACT 

This  paper  presents  a  computationally  efficient  hierar¬ 
chical  nearfield  imaging  algorithm  using  a  delay  and 
sum  beamformer  with  a  random  array  via  pulse-echo 
techniques. 

In  the  conventional  imaging  of  nearfield  sources,  sig¬ 
nals  from  multiple  sensors  are  delayed  and  summed 
to  focus  at  points  in  the  imaging  volume.  However, 
the  computational  burden  of  implementing  the  conven¬ 
tional  algorithm  for  arrays  with  a  large  number  of  re¬ 
ceivers,  due  to  the  distance  calculations,  requires  new 
suboptimal  algorithms.  The  algorithm  described  in 
this  paper  reduced  computational  time  balanced  against 
a  reasonable  memory  requirements.  The  key  principle 
of  this  algorithm  is  to  implement  the  delay  and  sum 
by  grouping  receivers  into  various  subarrays  in  a  num¬ 
ber  of  stages.  Mirroring  the  hierarchy  of  subarrays  is  a 
decomposition  of  imaging  volume  beginning  with  large 
voxels  and  progressively  reducing  the  voxel  size.  This 
algorithm  introduces  two  main  changes  to  the  conven¬ 
tional  algorithm  which  are  (i)  subsampling  of  the  re¬ 
ceived  data,  which  is  then  compensated  by  phase  inter¬ 
polations,  (it)  a  hierarchy  of  subarrays  and  subvolumes 
to  combine  the  data  from  all  the  receivers. 

Comparison  of  the  conventional  and  the  hierarchi¬ 
cal  algorithms  are  done  in  terms  of  their  point  spread 
functions. 

1.  INTRODUCTION 

In  this  paper,  we  consider  the  problem  of  nearfield 
imaging  using  an  acoustic  transmitter  with  a  band¬ 
width  of  several  MHz  and  a  receiving  array  of  sev- ' 
eral  thousand  randomly  positioned  sensors.  Acoustic 
imaging  has  found  applications  in  diverse  areas  such 
as  medical  diagnosis,  oceanic  search,  non-destructive 
evaluation  and  exploration  seismology  [1],  When  the 
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object  is  assumed  to  be  in  the  farfield  of  an  array,  the 
received  waveform  from  a  single  point  reflector  can  be 
assumed  to  be  planar.  Farfield  approximation  usually 
starts  around  at  a  range  r  =  l?  j\  as  described  in  [2] 
where  r  is  the  distance  from  the  array,  L  is  the  diameter 
of  the  array  and  A  is  the  wavelength  of  the  transmit¬ 
ted  signal.  However,  when  the  object  to  be  imaged  is 
within  this  distance,  new  techniques  such  as  in  [3]  need 
to  be  developed. 

In  underwater  acoustic  imaging,  a  common  tech¬ 
nique  is  to  use  a  delay  and  sum  beamformer.  This  tech¬ 
nique  obtains  the  image  of  an  object  from  the  delayed 
and  summed  backscattered  echoes  of  a  pulse  modulated 
signal  emitted  by  a  transmitter.  A  carrier  frequency  in 
the  region  of  3-5  MHz  is  a  compromise  between  low 
angular  resolution  at  the  lower  frequency  and  high  ab¬ 
sorption  at  higher  frequencies.  As  discussed  in  [4],  use 
of  a  linear  FM  signal  or  “chirp”  allows  lower  power 
transmission  for  a  given  range  resolution  and  a  band¬ 
width  of  several  MHz  allows  millimetre  resolution  to 
be  achieved.  To  obtain  good  cross-range  imaging,  a 
large  receiver  aperture  array  is  required  -  typically  of 
the  order  of  1  m2.  The  number  of  sensors  required  to 
densely  fill  such  an  aperture  is  prohibitive  and  thus  a 
sparse  array  must  be  considered.  The  sensor  locations 
are  chosen  randomly  to  reduce  grating  lobes  that  may 
otherwise  occur  (see  [5,  6]  and  the  references  therein). 

The  computational  burden  of  conventional  time  de¬ 
lay  and  sum  imageformers  using  outputs  of  all  the  re¬ 
ceivers  is  high  [7].  Two  features  conspire  against  us. 
Firstly,  the  sparseness  of  the  array  means  that  sub- 
arrays  have  extremely  directional  beam-patterns  and 
high  sidelobes.  If  we  combine  receivers  into  a  subar¬ 
ray  by  delaying  and  summing  their  data  streams,  that 
sum  will  only  generate  a  good  quality  image  of  a  small 
patch.  Thus,  this  process  must  be  repeated  for  each 
small  patch  to  cover  the  full  image.  The  second  fea¬ 
ture  that  makes  the  task  difficult  is  that  the  object  to 
be  imaged  is  in  the  nearfield.  Approximations  that  can 
be  used  in  the  farfield  do  not  all  apply  here.  For  ex- 
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ample,  when  calculating  the  distance  from  a  receiver 
to  many  voxels  in  the  image,  either  a  time-consuming 
formula  or  a  vast  lookup  table  is  required. 

In  acoustic  imaging  [4],  an  object  is  usually  mod¬ 
elled  in  two  ways:  the  object  region  is  composed  of 
a  number  of  either  point  reflectors  or  several  homo¬ 
geneous  media.  In  this  paper,  we  employ  the  first 
assumption.  Therefore  the  image  of  the  object  is  as¬ 
sumed  to  be  a  superposition  of  point  spread  functions 
of  the  point  reflectors  in  the  object  model.  Hence,  point 
spread  functions  are  used  as  quality  measures  for  the 
acoustic  imaging  algorithms  in  this  paper. 

In  this  contribution,  we  propose  three  novel  tech¬ 
niques  to  address  this  nearfield  acoustic  imaging  prob¬ 
lem. 

First,  a  piecewise  polynomial  approximation  is  used 
to  quickly  generate  the  distance  calculations  on  the  fly. 

Secondly,  the  nature  of  the  outputs  of  the  matched 
filter  used  in  the  ranging  processing  is  exploited  to  al¬ 
low  the  time  delay  and  sum  imageforming  to  be  imple¬ 
mented  using  coarse  time  delays  followed  by  a  fine  time 
delay  adjustment  using  phase  multiplications. 

Finally,  and  the  main  contribution  presented,  the 
imageforming  is  carried  out  in  a  hierarchical  manner  by 
grouping  receivers  into  various  subarrays  in  a  number 
of  stages.  Mirroring  the  hierarchy  of  subarrays  is  a 
decomposition  of  the  imaging  volume  beginning  with 
large  voxels  and  progressively  reducing  the  voxel  size. 

In  the  next  two  sections,  we  describe  the  conven¬ 
tional  and  hierarchical  algorithms  and  address  some 
implementation  issues. 

2.  THE  CONVENTIONAL  NEARFIELD 
ACOUSTIC  IMAGING  ALGORITHM 

In  conventional  nearfield  imaging  [4,  7],  a  wideband 
signal,  typically  a  linear  FM  chirp,  is  transmitted  and 
backscattered  echoes  are  received  by  an  array  of  sen¬ 
sors.  The  sensor  outputs  are  sampled,  Hilbert  trans¬ 
formed  to  produce  complex  signals  and  “dechirped”  by 
convolving  with  a  complex  replica  of  the  transmitted 
signal.  The  typical  range  profile  from  a  single  receiver 
is  shown  in  Figure  1 .  To  image  in  cross-range  at  each 
voxel,  the  round-trip  distance  is  first  calculated  from 
the  transmitter  via  the  voxel  to  each  receiver.  The  cor¬ 
responding  sample  is  then  chosen  from  the  dechirped 
data  received  from  each  sensor.  These  samples  are 
summed,  and  the  absolute  value  of  the  sum  is  chosen 
to  be  the  voxel  intensity. 

This  conventional  algorithm  is  not  practical  due  to 
high  computational  resource  requirements.  Therefore, 
we  have  altered  that  algorithm  to  achieve  a  balance 
between  the  speed,  memory  size  and  the  image  quality. 


Time 


Figure  1 :  Range  profile  from  a  single  receiver 


The  next  section  describes  the  new  algorithm. 

3.  A  HIERARCHICAL  NEARFIELD 
ACOUSTIC  IMAGING  ALGORITHM 

The  hierarchical  algorithm  proposed  in  this  paper  di¬ 
vides  the  whole  array  into  subarrays,  and  processes  the 
data  in  a  hierarchical  manner.  This  algorithm  is  simi¬ 
lar  to  the  Quadtree  Backprojection  algorithm  described 
in  [8]  for  use  with  impulse  radars  and  the  fast  beam¬ 
forming  algorithm  in  [9].  Depending  on  the  number  of 
algorithm  stages  chosen,  the  array  is  first  divided  into 
a  number  of  subarrays  of  roughly  equal  area.  These  are 
divided  into  smaller  arrays  until  we  reach  individual  re¬ 
ceivers.  For  example,  if  the  algorithm  is  implemented 
such  that  it  has  three  stages,  then  it  divides  the  array 
into  only  small  and  large  subarrays.  If  ni  large  sub¬ 
arrays  are  used  and  each  large  subarray  contains  7?2 
points  which  are  the  centre  points  of  small  subarrays, 
then  these  JV2  =  ni«2  points  would  form  the  centres 
of  small  subarrays.  By  varying  the  position  and  num¬ 
ber  of  large  and  small  subarrays,  one  can  change  the 
performance  and  the  computational  complexity  of  the 
hierarchical  algorithm. 

The  hierarchical  algorithm  uses  the  key  idea  that  a 
subvolume  of  many  voxels  will  lie  in  the  farfield  of  a 
small  subarray,  allowing  shortcuts  in  the  computation. 
Smaller  subvolumes  lie  in  the  farfield  of  larger  subar¬ 
rays.  Thus  we  deal  with  either  fewer  subvolumes  and 
all  the  array,  or  all  the  voxels  and  a  few  large  subar¬ 
rays.  By  combining  receivers  into  small  subarrays  and 
then  into  large  subarrays,  we  can  focus  on  first  large 
image  subvolumes  and  then  on  smaller  ones  and  finally 
on  voxels.  This  process  is  similar  to  the  butterfly  struc¬ 
ture  in  FFT  or  the  Butler  matrix  used  in  radar.  How¬ 
ever,  in  these  cases  where  we  are  dealing  with  plane 
waves  the  butterfly  decomposition  is  exact,  whilst  in 
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this  approach  it  is  approximate.  Therefore,  the  pro¬ 
posed  algorithm  has  a  hierarchical  structure  where  the 
concurrent  contraction  in  the  imaging  subvolumes  and 
an  expansion  in  array  subareas  continue  with  an  im¬ 
provement  in  image  resolution  at  each  stage.  This  fact 
allows  us  to  reduce  the  number  of  computations  at  each 
stage  by  a  significant  factor.  A  pseudo-code  of  the  op¬ 
erations  carried  out  in  a  stage  of  the  algorithm  can  be 
seen  in  Table  1.  In  order  to  further  reduce  the  algo- 

for  each  large  subarray 
load  cubic  coeffs  for  n,-  small  subarrays 
for  each  large  subvolume 
for  each  of  the  n,-  small  subarrays 

load  data  stream,  apply  phase-shifts 
for  each  small  subvolume 
Calculate  distance 
Convert  to  (sample  number,  phase) 

Add  data  substream  to  new  substream 
save  new  substreams 

zero  new  substreams  for  next  large  subvolurae 

Table  1:  Pseudo-code  of  one  stage  of  the  hierarchical 
imaging  algorithm 

rithm  requirements,  both  in  computation  and  memory, 
the  two  techniques  described  in  the  following  section 
can  be  used.  These  techniques  may  also  be  incorpo¬ 
rated  in  the  conventional  algorithm. 


same  way,  by  nesting  these  additions  three  levels  deep 
we  can  evaluate  a  cubic  polynomial  using  just  three 
adds  for  each  value  of  n. 

In  addition  to  distance  approximations,  extra  speed¬ 
up  of  acoustic  imaging  algorithms  may  be  possible  via 
subsampling  and  then  interpolating  received  signals.  In 
many  applications,  design  considerations  dictate  a  sam¬ 
pling  rate  that  exceeds  the  transmitted  signal  band¬ 
width  by  a  factor  of  4-10.  The  impulse  response  of  the 
matched  filter  output  (ie.  the  range  compressed  chirp) 
typically  looks  like  Figure  2.  Time  delaying  a  receiver 


Figure  2:  The  output  of  a  filter  matched  to  a  linear 
FM  signal  driven  by  the  same  signal. 


4.  IMPLEMENTATION  ISSUES 

In  many  software  and  hardware  implementations,  ex¬ 
cessive  delays  can  be  caused  by  pre-calculating  delay 
values  and  transferring  the  required  blocks  of  data  and 
the  delays  in  and  out  of  cache  memories.  It  is  often 
more  efficient  to  calculate  time  delays  on  the  fly  which 
requires  fast  algorithms.  The  following  technique  can 
be  used  to  do  so  and  it  will  be  employed  in  the  new 
hierarchical  algorithm  described  in  the  next  section. 

The  distance  can  be  approximated  well  over  part 
of  the  imaging  volume  by  a  cubic  polynomial.  We  will 
use  several  polynomials  Pi,  each  covering  a  part  Vi  of 
the  imaging  volume.  The  coefficients  in  each  polyno¬ 
mial  Pi  are  found  by  regression  using  several  distances 
within  Vi.  This  can  be  done  off-line  and  only  twenty 
coefficients  per  cubic  polynomial  need  be  stored. 

To  evaluate  each  cubic  polynomial  over  many  equally 
spaced  voxels,  we  use  the  following  method.  We  could 
evaluate  a  linear  function  f(n )  =  a  +  bn  at  integers 
n  =  0, 1, 2, . . .  simply  by  starting  with  a  and  contin¬ 
ually  adding  b  to  the  previous  answer.  This  uses  one 
addition  to  evaluate  f(n)  for  each  value  of  n.  In  the 


output  can  be  achieved  by  first  selecting  at  a  suitable 
subsampling  factor  the  sample  nearest  to  the  peak  of 
the  modulation  -  this  is  termed  coarse  time  delay.  Pro¬ 
vided  we  are  within  the  3  dB  point  of  the  peak,  a  fine 
time  delay  adjustment  may  be  made  through  multi¬ 
plication  by  the  appropriate  phase  factor.  This  sub¬ 
sampling  (typically  4-8  in  practice)  allows  substantial 
savings  in  down-stream  computations  as  it  enables  the 
hierarchical  algorithm  of  the  following  section  to  be  im¬ 
plemented. 

The  above  techniques  can  be  used  also  with  the  con¬ 
ventional  algorithm.  A  combination  of  these  ideas  are 
used  in  the  proposed  algorithm  to  reduce  the  number 
of  calculations  to  a  small  fraction  of  the  number  needed 
by  the  conventional  acoustic  imaging  algorithm. 

5.  SIMULATION  STUDIES 

Figure  3  gives  the  cross-sections  of  the  point  spread 
functions  of  the  conventional  and  hierarchical  algorithms 
in  the  ( x ,  y)  plane,  while  Figure  4  shows  similar  cross- 
sections  in  the  ( x ,  z)  plane.  In  this  example,  an  ar- 
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ray  of  around  2500  receivers  pseudo-randomly  spaced 
in  a  (50cm)x(50cm)  aperture  was  used,  implying  an 
average  separation  of  about  10A.  A  three-stage  hier¬ 
archical  algorithm  was  employed  with  21  large  subar¬ 
rays  and  sixteen  small  subarrays  in  each  large  subarray. 
The  decomposition  of  the  image  volume  was  9,  3  and 
1mm  respectively  at  a  range  of  one  metre.  The  subar¬ 
rays,  as  well  as  the  receivers,  formed  irregular  patterns. 
We  have  found  that  regular  subarrays  gave  rise  to  high 
sidelobes,  even  if  the  receivers  had  no  regularity  at  all. 

Figure  3  shows  a  raised  sidelobe  floor  in  the  region 
near  the  mainlobe.  The  value  of  the  point  spread  func¬ 
tion  is  much  higher  in  the  central  subvolume  than  in 
the  surrounding  subvolumes.  Also,  the  position  of  the 
point  reflector  relative  to  the  coarse  subvolumes  affects 
the  point  spread  function.  Figures  3  and  4  show  the 
result  when  the  point  reflector  happens  to  be  at  the  cen¬ 
tre  of  a  coarse  subvolume,  which  is  when  the  algorithm 
does  best.  Greater  errors  are  introduced  by  the  algo¬ 
rithm  when  the  point  reflector  is  the  edge  of  a  coarse 
subvolume.  The  cross-section  in  the  range  direction 
can  be  seen  in  Figure  4.  It  can  be  observed  that  most 
of  the  blurring  is  in  the  cross-range  direction.  The  hier¬ 
archical  algorithm  causes  sidelobes  to  blur  more  in  the 
range  direction,  while  as  Figure  3  indicates,  the  main- 
lobe  is  blurred  mostly  in  the  cross-range  direction. 

6.  CONCLUSION 

In  this  paper,  we  have  presented  a  hierarchical  method 
for  nearfield  acoustic  imaging  using  a  large,  pseudo¬ 
random  array  of  sensors.  This  algorithm  achieves  a 
slightly  degraded  image  with  a  reduced  amount  of  mem¬ 
ory  in  a  shorter  computation  time.  The  algorithm  ap¬ 
plies  several  modifications  to  the  conventional  imaging 
algorithm.  These  modifications  are  as  follows: 

•  The  use  of  a  hierarchy  of  subarrays  allows  a  signif¬ 
icant  reduction  in  the  number  of  calculations  re¬ 
quired.  The  amount  of  time  saved  depends  on  the 
resolution  required  compared  with  the  density  of 
receivers  in  the  array.  However,  close  attention 
must  be  paid  to  the  arrangement  of  subarrays  to 
limit  degradations  to  the  image. 

•  The  data  was  down-sampled  to  reduce  the  num¬ 
ber  of  operations  required.  The  resulting  image 
would  be  severely  degraded  unless  interpolations 
were  applied  in  the  form  of  complex  phase-shifts. 
We  allowed  only  a  quantized  set  of  phase-shifts. 
These  phase-shifts  were  found  efficiently,  and  did 
not  cause  a  great  degradation  in  the  image. 

•  The  calculation  of  time-delays,  which  forms  a  large 
part  of  the  conventional  algorithm,  was  achieved 


in  the  hierarchical  one  with  polynomial  approxi¬ 
mations  in  a  few  adds  each. 

Moreover,  the  hierarchical  algorithm  allows  minimal  in¬ 
teraction  with  the  hard  disc,  which  is  large  enough  to 
contain  all  the  data  but  slow  to  fetch  data  from.  This 
is  as  important  as  the  reduction  in  number  of  calcula¬ 
tions. 

Point-spread  functions  were  presented  to  compare 
the  two  algorithms.  Raised  sidelobe  floors  were  seen 
in  the  hierarchical  algorithm’s  point  spread  function. 
These  can  be  reduced  by  a  judicial  choice  of  irregular 
subarrays. 
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Figure  3:  Conventional  and  hierarchical  algorithm  point  spread  functions  in  the  (x,y)  plane.  Intensities  shown  in 
dB  scale.  The  lower  graphs  show  the  cross-sections  of  the  point  spread  functions  of  hierarchical  and  conventional 
algorithms  at  y  =  0  and  y  =  4  mm. 
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Figure  4:  Conventional  and  hierarchical  algorithm  point  spread  functions  in  the  ( x,z )  plane.  Intensities  shown  in 
dB  scale.  The  lower  graphs  show  the  cross-sections  of  the  point  spread  functions  of  hierarchical  and  conventional 
algorithms  at  y  =  0  and  y  =  4  mm. 
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ABSTRACT 

DTI  has  demonstrated  that  Synthetic  Aperture  Sonar  (SAS)  can 
provide  orders  of  magnitude  improvement  in  cross-range 
resolution  when  compared  to  conventional  sonar  beamforming. 
Like  Synthetic  Aperture  Radar  (SAR),  SAS  processing  requires 
coherence  over  multiple  measurements,  but  has  long  been 
impractical  due  to  the  nature  of  the  ocean  environment.  We  have 
extended  SAR  processing  ideas  to  accommodate  the  issues 
specific  to  the  underwater  environment,  and  have  successfully 
synthesized  apertures  extending  many  thousands  of  wavelengths. 
We  will  present  an  overview  of  the  theory  of  SAS  processing, 
how  it  differs  from  SAR,  and  will  show  experimental  results 
from  SAS  processing  of  sonar  data. 

1.  INTRODUCTION 

Synthetic  Aperture  Sonar  (SAS)  is  an  underwater  adaptation  of 
the  techniques  used  in  Synthetic  Aperture  Radar  (SAR).  SAS  is 
the  coherent  integration  of  data  from  a  number  of  transmissions 
as  the  sonar  moves  along  its  track.  The  resolution  limit  for  the 
focused  image  in  the  cross-range  direction  is  one  half  the  lateral 
size  of  the  receiver  element,  at  all  ranges.  High  area  coverage  rate 
is  achieved  by  use  of  a  multiple  element  receive  array. 

Not  only  does  SAS  allow  for  the  generation  of  sonar  imagery 
with  range-independent  resolution,  but  the  resolution  is  also 
independent  of  frequency,  up  to  normal  diffraction  limits. 
Synthetic  aperture  processing  is  scalable,  and  it  has  been  applied 
in  both  mine  hunting  and  anti-submarine  warfare  (ASW) 
scenarios. 

When  adapting  SAR  algorithms  to  SAS  systems,  several 
important  differences  between  the  two  technologies  come  to 
light.  Table  1  lists  some  of  them.  While  wavelengths  are  similar 
for  the  two  systems,  SAS  systems  typically  use  radiating 
elements  smaller,  in  proportion  to  the  wavelength,  than  SAR 
systems.  As  a  result,  radiation  beampattems  for  SAS  systems 
have  a  greater  angular  extent  than  those  for  SAR  systems.  The 
effect  on  synthetic  aperture  processing  is  that  range  migration 
(the  shifting  of  a  target’s  return  through  range  resolution  cells  as 
the  platform  flies  past)  is  quite  significant  and  pronounced  for 
SAS  systems. 

Because  of  the  long  propagation  delays  and  the  desire  to 
minimize  the  synthetic  aperture  time,  SAS  systems  generally  use 
an  array  of  receiver  elements.  Aperture  sampling  requirements, 
along  with  the  slow  propagation,  imply  an  interdependence 
between  array  length  (L),  ping  repetition  interval  (PRI), 
maximum  platform  velocity  ( VSAS)  such  that, 


Vus  ^ - ■ 

s  2  PRI 

Perhaps  the  most  significant  difference  between  SAS  and  SAR 
systems  is  the  medium  coherence.  Along  with  medium 
fluctuations,  platform  stability  can  also  affects  the  signal  phase 
history.  Since  synthetic  aperture  times  for  SAS  are  an  order  of 
magnitude  greater  than  those  for  SAR,  the  phase  corruption  due 
to  these  effects  can  be  much  more  severe  for  SAS. 
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wavelength  [m] 

0.01 -0.3  (typ.) 

0.01  -  1.5 

SA  time 

few  seconds 

minutes  -  hours 

SA  length 

few  km  (SIR) 

30  m  -  5  km 

PRI 

0.5  -  1  ms  (SIR) 

0.1  s  -  1  min. 

range  stand-off 

>  225  km  (SIR) 

few  meters  -  km 

medium  coherence 

Days 

minutes 

Table  1.  Comparison  of  SAR  and  SAS  key  parameters. 
Some  of  the  SAR  parameters  are  derived  from  the  Shuttle 
Imaging  Radar  (SIR). 


In  the  next  section  we  describe  the  flow  of  a  SAS  processor,  and 
in  Section  3,  we  present  results  of  three  different  SAS  systems. 
The  DARPA  SAS  system  and  the  CSS  system  are  intended  for 
mine  countermeasures  and  use  rigid  tow  bodies.  The  SWAC  data 
was  collected  using  a  long  flexible  towed  array  at  low 
frequencies. 

2.  SAS  PROCESSING 

Synthetic  aperture  sonar  produces  high-resolution  imagery  by 
coherently  combining  data  from  multiple  sonar  pings.  To 
produce  high-quality  SAS  results,  these  data  must  be  referenced 
to  a  straight  platform  track  with  sub-wavelength  accuracy.  The 
motion  of  even  well-behaved  platforms,  and  phase  distortions 
from  the  complicated  underwater  environment  conspire  to  make 
this  a  challenging  requirement,  and  much  of  our  SAS  processor 
is  devoted  to  overcoming  platform  motion  and  phase  errors.  In 
general,  our  synthetic  aperture  processing  can  be  divided  into 
three,  usually  distinct,  operations.  In  the  first  step,  the  data  are 
adjusted  to  compensate  for  initial  estimates  of  platform  motion, 
using  inputs  from  both  the  hardware  mocomp  suite,  if  one  exists, 
and  from  information  extracted  from  the  sonar  data  itself.  The 
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second  component  is  the  image  formation  algorithm  that 
performs  azimuthal  compression  on  the  motion-compensated  raw 
data  and  generates  an  image  from  the  raw  data.  The  third 
operation  involves  correction  of  phase  errors  due  to 
uncompensated  platform  motion  and  medium  instabilities,  which 
is  usually  accomplished  by  application  of  some  form  of 
autofocus  algorithm 

2.1  Motion  Compensation 

When  available,  platform  motion  estimates  derived  from 
specialized  motion  compensation  hardware  are  applied  to  the 
sonar  data  to  remove  many  of  the  effects  of  platform  motion. 
Unfortunately,  such  estimates  are  often  not  available,  due  to 
hardware  failures  or  simply  the  absence  of  these  costly 
instruments.  To  process  data  with  this  limitation,  we  estimate 
phase  error  through  the  analysis  of  the  of  the  sonar  returns  only. 
One  technique  we  employ  to  estimate  towbody  motion  from 
sonar  data  is  a  Prominent  Point  Processor  [1][2],  which  relies 
upon  the  existence  of  a  dominant  scatterer  in  the  scene.  After 
manually  selecting  the  prominent  point  and  locating  its  point  of 
closest  approach,  its  echo  history  is  compared  to  the  theoretical 
form  it  would  have  were  there  no  platform  motion.  At  each 
along-track  location,  the  discrepancy  in  range  and  phase  is 
calculated  and  is  interpreted  as  a  measurement  of  towbody 
displacement.  This  set  of  motion  estimates  is  then  applied  to  the 
entire  data  set,  restoring  the  sonar  returns  to  their  theoretical 
zero-motion  locations. 

Though  robust,  this  technique  requires  the  existence  of  strong, 
point-like  scatters  throughout  the  area  being  imaged.  Moreover, 
the  returns  from  those  prominent  points  must  be  distinguishable 
from  other  echoes  throughout  the  whole  integration  length  that  it 
takes  to  develop  a  synthetic  aperture.  An  alternative  method  for 
estimating  platform  motion  from  sonar  scene  content,  known  as 
Redundant  Phase  Center  processing,  has  also  been  tested.  RPC  is 
derived  from  the  SAR  technique  of  along-track  interferometry. 
RPC  estimates  towbody  motion  by  performing  range-wise 
correlations  on  data  from  overlapping  segments  of  the 
hydrophone  array  on  successive  pings  of  the  sonar.  Our  RPC 
implementation  for  SAS  requires  neither  a  prominent  point  nor 
critical  decisions  and  input  from  the  analyst.  Since  the  received 
signal  varies  as  e10*,  where  co  is  the  carrier  frequency,  the  complex 
correlation  between  the  two  repeated  looks  is  sensitive  to  phase 
errors  on  the  order  of  a  fraction  of  a  cycle. 

Although  either  of  these  algorithms  could  fail  if  the  sonar  data  is 
of  poor  quality,  they  do  have  advantages  over  hardware-based 
motion  measurement  approaches.  By  its  nature,  motion 
measurement  hardware  can  provide  information  about  platform 
motion  only.  Unfortunately,  there  are  many  other  sources  of 
phase  corruption  in  the  underwater  environment,  such  as  medium 
instabilities,  multipath  propagation,  and  medium 
inhomogeneities.  Phase  errors  derived  from  analysis  of  sonar 
echoes  include  the  effects  of  all  of  these  sources  of  phase 
corruption,  and  can  therefore  correct  all  phase  errors 
simultaneously. 


1.2  RMA  Image  Formation 

Several  SAR  image  formation  algorithms  exist  in  the  literature. 
Examples  are  direct  matched  filtering,  chirp  scaling  algorithm, 
and  the  range  migration  algorithm  (RMA)  [2]  [3],  Because  of  its 
speed  and  full  2-dimensional  treatment  of  the  problem,  we  have 
chosen  RMA  the  image  formation  stage  of  our  SAS  processor, 
and  the  images  shown  in  the  next  section  were  all  focused  using 
RMA.  Developed  for  work  in  the  geophysics  community,  RMA 
implements  an  exact  solution  of  the  synthetic  aperture  problem 
using  fast  Fourier  transforms  to  efficiently  apply  the  matched 
filter  in  the  wavenumber-frequency  domain.  With  this  technique, 
both  speed  and  theoretical  performance  are  achieved. 

Since  the  RMA  formalism  explicitly  treats  only  single-element 
systems,  where  the  transmitter  and  receiver  use  the  same  antenna. 
However,  a  typical  SAS  system  uses  a  single  transmitter  with  an 
array  of  receivers,  and  in  addition,  the  transmitter  may  be 
physically  separated  from  the  receiver  array  by  a  significant 
distance.  Therefore,  pre-processing,  in  the  form  of  a  phase 
correction,  is  required  to  correct  for  the  physical  separation 
between  each  receiver  and  the  transmitter.  After  this  correction, 
the  receiver  data  is  equivalent  to  samples  along  the  synthetic 
aperture  corresponding  to  a  single  transmitter/receiver  antenna. 

1.3  Phase  Gradient  Autofocus 

To  remove  residual  uncompensated  motion  from  the  focused 
images,  we  employ  the  phase  gradient  autofocus  (PGA) 
algorithm  [4][5],  The  PGA  algorithm  selects  candidate  point-like 
targets  in  the  synthetic  aperture  image,  estimates  the  residual 
phase  error  at  those  points,  and  combines  them  into  an  optimal 
estimate  of  the  phase  error.  This  phase  error  is  removed  from  the 
image  and  the  process  iterated  until  convergence  is  obtained. 

To  estimate  the  phase  error,  only  strong  scatterers  are  used.  Once 
detected,  they  are  windowed  and  shifted  to  occupy  same  location 
in  the  synthetic  aperture  (doppler  history).  The  phase  gradient  is 
estimated  by, 

Im{G~ 

|G(«)f 

where  G(u)  is  the  signal  along  the  synthetic  aperture,  u.  Other 
phase  gradient  estimators  exist,  but  the  above  estimator  is  the 
linear  unbiased  minimum  variance  estimator.  Integrating  the 
phase  gradient  is  then  an  estimate  of  the  phase  error.  After 
correcting  the  data  for  the  estimated  phase  error,  the  algorithm 
can  be  iterated. 

3.  EXPERIMENTAL  RESULTS 
3.1  DARPA  SAS  System 

The  DARPA  SAS  system  (Figure  1),  built  and  operated  by 
Raytheon,  is  a  heavy-tow  body  with  a  one-sided,  keel-mounted 
hydrophone  array  consisting  of  32  elements.  The  contiguous 
array  elements  each  have  a  width  of  10.9  cm,  and  were  originally 
configured  with  half  of  the  elements  forming  a  16-element  array, 
allowing  for  a  final  cross-range  resolution  of  5.5  cm.  Later,  they 
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were  wired  in  pairs  to  produce  a  16-element  array  of  21.8  cm 
elements,  supporting  a  resolution  of  just  under  11  cm.  The 
projector  array  is  located  on  an  adjustable  “wing”  approximately 
0.8  m  above  the  receiving  array.  This  design  makes  it  possible  to 
transmit  a  narrow  vertical  beam  to  reduce  the  effects  of  multipath 
and  concentrate  transmitted  power  on  long-range  targets.  The 
projector  operates  on  a  center  frequency  of  50  kHz,  and  can 
transmit  in  either  of  two  modes:  a  coherent  burst  consisting  of 
6  cycles  of  the  carrier  frequency,  or  a  linear  FM  (LFM)  chirp  of 
programmable  duration  and  bandwidth.  With  a  bandwidth  of 
10  kHz,  the  LFM  mode  yields  a  theoretical  range  resolution  of 
7.5  cm;  the  6-cycle  tone  achieves  9  cm. 


Figure  1.  Raytheon-DARPA  SAS  towbody.  Keel- 
mounted  hydrophone  array  is  3.2  m  long.  Projector  array 
is  located  in  adjustable  wing  and  provides  narrow 
vertical  beam  for  long-range  operation. 

A  PVC  frame  supporting  a  variety  of  known  scientific  targets 
was  deployed  during  several  sea  tests  (Figure  2).  Once  installed, 
the  rectangular  frame  lies  flat  on  the  sea  bottom,  and  has  a  series 
of  posts  protruding  upward  to  support  targets.  Posts  running 
down  the  center  of  the  frame  supported  two  groups  of  five  air- 
filled  steel  spheres:  one  group  consisted  of  10-cm  diameter 
spheres,  the  other  20-cm.  In  each  group,  the  targets  were 
separated  by  1,  2,  4,  and  8  diameters.  The  spheres  should  behave 
as  ideal,  but  low  cross-section,  point  scatterers  and  thus  test  the 
resolution  of  the  SAS  system.  Two  posts  along  the  farthest  edge 
of  the  frame  supported  triplane  comer  reflectors  (30-  and  60-cm 
diameter),  while  two  posts  on  the  closest  edge  were  empty. 

Figure  2  shows  the  final  SAS  image  obtained  from  data  collected 
at  a  range  of  about  190  m  in  Lake  Washington,  with  the  array 
configured  with  10.9  cm  receiver  elements.  This  image  was 
generated  using  RPC,  RMA,  and  PGA,  as  described  in  the  last 
section.  Visible  in  the  center  of  the  SAS  image  are  the  spheres 
(20  cm  on  the  right,  10  cm  on  the  left),  as  well  as  a  comer  of  the 
support  frame  at  the  extreme  right.  The  artifacts  appearing  as 
double-images  in  range  of  the  spheres  is  consistent  with  what 
would  be  expected  from  a  single  multipath  bounce  off  the  bottom 
of  the  lake.  At  the  far  edge  (bottom),  the  two  strong  comer 
reflectors  are  evident,  as  is  another  comer  of  the  frame.  The 
empty  posts  along  the  near  edge  (top)  are  also  visible.  The 
support  frame  itself  is  visible  as  a  feint  line  connecting  the  target 
images.  The  3-dB  down-point  resolution  achieved  in  this  image, 
evaluated  by  taking  a  series  of  intensity  cuts  through  the  images 
of  the  10-  and  20-cm  spheres,  varies  slightly  within  the  image, 
with  the  best  cross-range  resolution  being  approximately  7  cm, 
compared  to  5  cm  theoretical. 


Figure  2.  Target  frame  (top)  with  10-  and  20-cm 
spherical  targets  centrally  mounted.  Conventional 
beamformed  imaged  (lower  left)  shows  much  less  detail 
than  SAS  image  (lower  right). 

A  popular  sonar  target  in  Lake  Washington  is  a  sunken  PB4Y-2 
navy  patrol  aircraft  resting  in  50  m  of  water.  A  SAS  image  of  this 
target  obtained  from  a  range  of  350  m  and  processed  with 
prominent  point  motion  estimation  is  shown  in  Figure  3.  The 
most  striking  feature  compared  with  other  published  sonar 
images  of  this  target  is  that  the  aircraft  has  a  ghostly  appearance, 
as  if  its  skin  has  been  removed.  Subsequent  analysis  has 
determined  that  the  thin  aluminum  skin,  immersed  in  water  on 
both  sides,  is  nearly  transparent  to  the  50  kHz  sonar  signals  used 
by  this  sonar,  with  less  than  2  dB  of  transmission  loss  through 
the  skin  decreasing  to  virtually  no  loss  at  grazing  incidence. 
Comparisons  to  published  plans  of  the  PB4Y-2  confirm  that  the 
periodic  features  seen  along  the  fuselage  in  the  SAS  image  are 
interior  structural  elements  of  the  airplane.  The  wing  and  tail 
spars  are  visible,  as  are  some  of  the  ribs  that  support  the  aft 
fUselage.  The  locations  of  highlights  in  the  tail  portion  of  the 
image  agree  with  the  blueprints  showing  the  locations  of  frames 
in  the  PB4Y-2  to  within  3%,  thus  supporting  the  conclusion  that 
the  50  kHz  sonar  is  imaging  the  interior  of  the  water-filled  target. 


Figure  3.  PB4Y-2  airplane  from  a  range  of 
approximately  350  m.  The  SAS  image  shows  details  of 
the  planes  internal  structure,  resolved  to  approximately 
15  cm. 
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A  second  run  past  the  airplane  collected  data  from  a  range  of 
980  m.  A  SAS  image  produced  from  this  data  is  shown  in 
Figure  4.  Although  the  long-rage  image  is  not  illuminated  as  well 
as  the  airplane  seen  from  closer  range,  the  resolution  is 
approximately  the  same  in  the  two  images.  Sound  velocity 
profiles  in  the  lake  indicated  that  medium  was  downward- 
refracting,  and  ray  tracing  codes  indicated  that  there  were  no 
direct  acoustic  paths  past  ranges  of  about  500  m,  where  the  last 
ray  hit  the  bottom.  The  airplane  was  therefore  imaged  with  sound 
that  had  suffered  two  bottom  bounces.  In  fact,  the  airplane  is 
situated  such  that  the  illuminating  ray  apparently  came  from  a 
side-lobe  of  our  narrow-vertical-beam-width  transmitter  and  even 
then  it  almost  missed  the  aircraft. 


Figure  4.  SAS  image  of  PB4Y-2  airplane  at  980  m 
(above).  Sound  velocity  profile  and  ray  trace  results  for 
Lake  Washington. 


3.2  CSS  High  and  Low  Frequency  SAS  Results 

Coastal  Systems  Station  (CSS)  in  Panama  City,  Florida  collected 
SAS  data  using  an  array  built  by  Northrop-Gruman.  This  array 
contained  14  high  frequency  (180  kHz)  elements  and  7  low 
frequency  (20  kHz)  elements.  The  length  of  the  high  frequency 
elements  was  5.08  cm,  giving  a  theoretical  cross-range  resolution 
of  2.54  cm,  and  the  high  frequency  elements  were  3.81  cm. 
However,  the  cross-range  resolution  for  the  low  frequency  is 
limited  by  the  wavelength  (7.5  cm)  rather  than  the  element 
length.  For  these  experiments,  a  series  of  objects,  such  as  a  large 
cylinder,  a  ladder,  and  a  truncated  cone,  were  placed  on  a  sandy 
bottom.  Figure  5  shows  the  focused  image  for  the  high  frequency 
data,  and  Figure  6  is  the  resulting  image  for  the  low  frequency.  In 
the  case  of  the  high  frequency  data,  the  image  was  formed  using 
RPC  and  RMA,  but  an  autofocus  step  was  not  required.  The  low 
frequency  image  was  formed  using  all  three  steps. 

Some  interesting  differences  between  the  two  images  indicate 
how  target  response  is  frequency  dependent.  For  example,  the 
cylinder  object  (upper  right  comer)  in  the  high  frequency  image 
is  bright  along  its  whole  length,  and  a  distinct  shadow  is  visible. 
However,  the  only  its  ends  are  bright  in  the  low  frequency  image, 
and  no  shadow  is  present.  Similarly,  the  long  shadow  behind  the 
truncated  cone  (just  below  the  cylinder)  also  disappears  for  the 
low  frequency  image. 
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R=  50  m  SAS-processed  CSS  data  0.55m  array  180  kHz 
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Figure  5.  High  frequency  (180  kHz)  SAS  image.  A 
cylinder  in  the  upper  right  gives  a  clear  shadow. 
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Figure  6.  Low  frequency  (20  kHz)  SAS  image.  Objects 
and  background  reverberation  have  different  image 
characteristics  at  20  kHz  than  at  1 80  kHz. 
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3.3  ONR  SAS  Results 


4.  CONCLUSIONS 


The  low-frequency  sonar  system  used  in  the  SWAC  tests  consists 
of  two  separately  towed  components,  a  900  m  neutrally-buoyant 
flexible  line,  the  central  256  m  of  which  are  populated  by  256 
hydrophones,  and  a  heavy-towed  transmitter  vehicle  with  a  10  m 
vertical  projector  array.  The  hydrophones  are  not  uniformly 
spaced  along  the  array,  so  we  resample  the  data  onto  a  uniform 
grid  prior  to  our  SAS  processing.  The  system  operates  at  a 
600  Hz  center  frequency,  and  is  capable  of  ensonifying  targets  at 
very  long  ranges.  Two-vehicle  systems  such  as  this  are  less  than 
ideal  for  SAS,  because  the  transmitter  and  receiver  positions  are 
not  constrained  relative  to  one  another  as  they  are  in  a  single¬ 
vehicle  sonar  system.  As  a  result,  the  two-vehicle  configuration 
adds  another  degree  of  freedom  to  the  motion  compensation 
problem.  An  additional  issue  with  this  data  set  had  to  do  with 
speed  of  advance.  To  ensure  proper  spatial  sampling  of  the 
synthetic  aperture,  the  array  can  advance  no  farther  than  one-half 
of  its  length  on  every  ping.  During  the  SWAC  trials,  the  speed  of 
advance  exceeded  this  SAS  speed  limit  by  approximately  30%. 
In  spite  of  these  issues,  we  were  able  to  produce  focused  SAS 
images  from  these  data,  albeit  with  somewhat  elevated  side-lobe 
levels  due  to  the  tow  speed  excess. 

The  SWAC  data  set  contained  a  strong  return  at  a  range  well 
beyond  the  test  area.  Later  investigation  revealed  the  presence  of 
an  oil  rig  in  this  approximate  location,  which  we  now  believe  to 
be  the  source  of  these  long-range  returns.  Figure  7  shows  our 
SAS  image  of  this  target.  The  SAS  image  was  generated  after 
prominent  point  corrections  from  a  nearby  return  were  applied  to 
the  data;  autofocusing  was  also  applied  to  the  SAS  result.  Visible 
in  this  image  are  distinct  highlights  (probably  echoes  from  the  oil 
rig’s  support  structure)  resolved  to  approximately  5  m.  In 
contrast,  a  conventional  beamformed  image  using  the  real 
aperture  of  the  flexible  towed  array  would  have  a  cross-range 
resolution  of  about  250  m.  Because  the  sonar  was  towed  at  a 
speed  higher  than  the  SAS  limit,  cross-range  grating  sidelobes 
are  also  clearly  visible  in  this  image. 
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Figure  7.  SAS  image  generated  from  the  same  data 
(right)  shows  5  m  resolution.  Also  visible  in  the  image 
are  azimuthal  sidelobes,  a  consequence  of  the  sonar 
advancing  faster  than  the  SAS  speed  limit. 


We  have  demonstrated  that  synthetic  aperture  imaging  is  capable 
of  providing  order-of-magnitude  improvements  in  cross-range 
resolution  over  conventional  sonar  beamforming  techniques. 
SAS  techniques  are  widely  applicable  to  sonar  systems  of  widely 
varying  characteristics,  and  appear  to  be  robust  in  the  face  of 
multipath  acoustic  environment.  Table  2  summarizes  the  SAS 
resolution  results  for  the  three  systems  presented  in  this  paper. 


wave¬ 

length 

resolution 

limit 

processor 

resolution 

measured 

resolution 

DARPA 

3  cm 

5.0  cm 

7.0  cm 

7.0  cm 

3  cm 

10.0  cm 

15.0  cm 

17  cm 

CSS  high 

0.83  cm 

2.54  cm 

4.3  cm 

5.3  cm 

CSS  low 

7.5  cm 

7.62  cm 

1 1 .4  cm 

15  cm 

ONR 

2.5  m 

5.0  m 

5.0  m 

~11  m 

Table  2.  Summary  of  resolution  results  for  several 
experimental  SAS  systems. 
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ABSTRACT 

One  of  the  most  important  applications  of  nonlinear  dynam¬ 
ics  is  the  estimation  of  empirical  dynamical  models  from 
data,  in  order  to  explain  time  series  derived  from  physical 
processes.  Such  derived  models  can  then  be  used  for  a  vari¬ 
ety  of  data  processing  applications,  in  particular  for  detec¬ 
tion  and  classification  problems.  Typically,  the  parameters 
of  such  dynamical  models  are  estimated  directly  from  the 
time  series  by  minimizing  a  cost  function  with  least  squares. 
In  this  paper  we  discuss  the  theory  and  applications  of  an 
alternate  approach  for  estimation  of  such  nonlinear  dynam¬ 
ical  models  and  the  use  of  these  models  for  detection  and 
classification  of  seismic  and  acoustic  data.  We  apply  these 
ideas  to  real  data  derived  from  seismic  station  recordings 
in  the  region  of  the  Panama  Canal.  Finally  we  compare 
our  results  with  that  previously  achieved  by  the  method  of 
Master-event  correlations,  and  find  improved  performance. 
This  indicates  that  a  dynamical  model  approach  incorpo¬ 
rates  additional  signal  information  in  this  example. 

1.  INTRODUCTION 

In  recent  years  several  attempts  have  been  made  to  in¬ 
corporate  advances  in  nonlinear  dynamical  systems  theory 
into  robust  signal  processing  tools  for  analyzing  compli¬ 
cated  time  series.  In  this  paper  we  develop  a  method  for 
the  estimation  of  delay  differential  equation  signal  mod¬ 
els,  motivated  by  the  Yule- Walker  equations  of  autoregres¬ 
sive  modeling  theory  [1],  Estimating  the  model  in  this 
way  involves  estimation  of  both  higher  order  statistical  mo¬ 
ments  and  dynamical  moments.  The  dynamical  moments 
are  also  of  higher  order,  but  involve  the  derivative  of  the 
signal  [2].  We  demonstrate  that  these  models  can  simul¬ 
taneously  incorporate  standard  linear  and  nonlinear  signal 
measures.  In  addition  they  also  express  information  directly 
related  to  low-dimensional  deterministic  signal  evolution. 
Our  method  applies  to  very  general  classes  of  dynamical 
models  which  may  incorporate  vector  data  streams  of  sev¬ 
eral  physical  observables,  multiple  time-delayed  variables, 
and  even  explicitly  non-stationary  models.  Here,  however, 
for  clarity  and  because  of  space  limitations,  we  will  sum¬ 
marize  the  basic  technique  on  a  sub-class  of  models  consist¬ 
ing  of  delay-differential  equations  (DDEs)  in  a  single  scalar 
variable.  DDEs  are  known  to  succinctly  describe  a  wide 


variety  of  physical  processes,  and  their  estimation  has  been 
of  considerable  interest  recently  [3]  [4], 

Using  this  theory  we  outline  the  design  for  practical  de¬ 
tection  and  classification  algorithms  for  acoustic  and  seismic 
data  analysis  applications.  We  define  a  feature  space  from 
the  model  coefficients,  and  implement  both  Mahalanobis 
decision  criteria  and  a  neural  network  algorithm  to  generate 
signal  class  hypothesis  testing.  We  discuss  the  scaling  prop¬ 
erties  of  this  detector  and  compare  it  to  a  standard  energy 
detector.  We  show  that  the  dynamical  detector  scales  with 
the  sampling  rate  as  a  matched  filter  detector  even  though 
no  exact  signal  template  is  used.  Finally,  we  discuss  the 
utility  of  the  higher-order  dynamical  moments  as  classifica¬ 
tion  features  and  compare  our  results  to  other  classification 
algorithms  which  use  higher-order  moments  for  classifica¬ 
tion. 

2.  ESTIMATION  OF  NONLINEAR 
DYNAMICAL  MODELS 

As  a  general  description,  we  first  assume  that  we  observe  a 
continuous  scalar  data  stream  x(t )  generated  by  measure¬ 
ment  of  some  accessible  observable  of  a  physical  process. 
We  hypothesize  that  the  process  evolution  itself  can  be  ap¬ 
proximated  by  a  deterministic,  relatively  low-dimensional 
dynamics,  but  can  include  purely  stochastic  elements  (i.e. 
noise)  as  well.  We  will  also  utilize  up  to  D  time-delayed 
copies  of  x(t),  written  x(t  —  dr)  with  1  <  d  <  D.  Hence  our 
general  model  form  is 

x(t)  =  F  [x(t),x{t  —  r),  —  ,  a?(*  —  Dt )] .  (1) 

The  function  F  is  often  expanded  in  terms  of  some  basis 
functions.  For  our  analysis  we  will  restrict  our  attention  to 
two-delay  second  order  models  of  type 

x  =  ai*T1  -I-  <12Xt2  +  a 3XT1xT2  (2) 

where  we  introduced  the  shorthand  notations  x  =  x(t)  and 
xT  =  x(t  —  r).  This  model  has  been  used  successfully  to 
model  and  detect  quadratic  phase  couplings  [1], 

Here  we  diverge  from  an  exact  modeling  approach  which 
is  often  employed  in  nonlinear  dynamics  theory.  For  model¬ 
ing  purposes  determination  of  a  correct  functional  form  F  is 
necessary  to  recover  exact  dynamical  information  about  the 
original  system,  which  is  typically  problematic.  However 
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for  classification  purposes,  we  postulate  that  it  is  only  nec¬ 
essary  to  incorporate  sufficient  model  dimensionality  and 
nonlinearity  to  distinguish  the  required  signal  classes,  re¬ 
gardless  of  the  exact  form  of  the  original  dynamical  gener¬ 
ators.  Typically,  we  find  that  dimension  D  and  the  order 
of  nonlinearity  can  be  small  (e.g.  2  or  3),  since  signal  power 
is  usually  distributed  mostly  in  the  lowest  orders  of  nonlin¬ 
earity  and  dimension  (in  analogy  with  spectral  expansions). 
More  importantly,  we  find  it  crucial  to  define  a  standard 
model  form  for  the  dynamical  model  (i.e.  a  fixed  “dynami¬ 
cal  filter”)  in  a  particular  application,  otherwise  comparison 
between  different  signal  classes  becomes  impossible.  The 
unknown  model  coefficients  01,  a 2,  and  as  are  estimated 
for  each  data  window,  and  will  comprise  our  classification 
feature  space.  This  estimation  must  be  numerically  robust 
and  hopefully  explicitly  preserve  some  of  the  nonlinear  cor¬ 
relations  possibly  present  in  the  original  signal. 

Next  we  present  a  method  which  can  accomplish  both 
of  these  goals,  and  make  explicit  connection  to  higher-order 
spectral  theory.  Briefly  we  multiply  Eq.  (2)  by  each  basis 
term  xT1,  xT2,  and  xT1xT2,  and  average  over  an  observa¬ 
tion  window  of  length  T;  the  model  coefficients  are  then 
computed  by  solving  the  following  linear  equation: 


R*A  =  B 


(3) 


where 


R  = 


(  <*2> 

(  &T\  %T2  ) 


Vo 


'T\  JjT2 


) 


(Xti  xT2  )  ( Xt2  ) 
(x2)  (xTlxl2) 

{ Xr\XT2  )  (  XT1  XT2  )  ) 


(  ai  ) 

f  (xxT1)  ' 

A  = 

a2 

B  = 

(  XXT2  ) 

K  «3  ) 

V  <  XXT1 XT2  )  / 

Where  (*)  stands  for  the  expectation  value.  The  linear  sys¬ 
tem  (3)  are  the  normal  equations  for  Eq.  (2)  similar  to  that 
of  the  Yule- Walker  type  equations  in  parametric  modeling 
theory  [9]  where  the  model  coefficients  are  expressed  by  the 
correlation  matrix.  Note  that  the  correlation  involving  the 
signal  derivative  can  be  calculated  from  the  derivative  of 
the  correlation  function,  i.e. 

( xx n )  “  (xxT\)i 

and  (4) 

(xXt\Xt2)  =  O' — 1  X r 2 )  Y  ( XXt^Xt2 )• 

These  formulas  are  valid  in  the  long  window  limit  for  a 
bounded  stationary  signal  x(t).  The  main  practical  ad¬ 
vantage  of  using  Eq.  (3)  instead  of  solving  Eq.  (2)  in  a 
least  squares  sense  is  that  we  can  avoid  computing  the 
signal  derivatives,  which  is  the  main  difficulty  for  noisy 
signals.  The  expectation  values  on  the  left  hand  side  of 


Figure  1:  Numerically  estimated  detection  probabilities  Pd 
vs.  signal  to  noise  ratio  (SNR)  for  the  harmonic  signal 
for  the  dynamical  detector  (solid  line)  and  energy  detec¬ 
tor  (dashed  line). 


Eq.  (3)  can  be  expressed  as  standard  higher-order  data  mo¬ 
ment  functions  [5].  For  example,  (xTJx22)  =  mxxx(T2  - 
Ti,T2—n)  where  the  3rd  order  moment  function  is  defined 
as  mxxx(T I,r2)  =  (x(t)x(t,  -  n)x(t  -  r2)>  and  describes  bi¬ 
correlations.  We  also  note  that  the  dynamical  moments 
involving  x  arise  exactly  because  of  the  dynamical  repre¬ 
sentation  and  express  information  not  utilized  in  standard 
higher  order  methods. 

We  have  applied  the  above  classification  methods  to  a 
variety  of  real  world  data  sets,  including  stationary  and 
transient  sonar  data  [1]  and  dolphin  echo-location  data  [6]. 
In  this  paper  we  apply  these  ideas  to  the  automated  clas¬ 
sification  of  real-world  seismic  data  derived  from  seismic 
station  recordings  in  the  region  of  the  Panama  Canal  [7]  [8]. 

3.  SIGNAL  DETECTION 

Let  us  consider  the  simple  example  of  a  harmonic  signal 
x(t)  =  sin(wf).  It  is  easy  to  show  that  this  signal  can  be 
represented  exactly  by  a  single-delay,  first-order  DDE: 

x  =  aix  +  a2xr.  (5) 

If  t  is  chosen  such  that  mxx(r)  =  0,  we  find  that  the  sig¬ 
nal  is  represented  by  the  reduced  coefficients  ai  =  0  and 
02  =  — w.  For  our  numerical  analysis  we  generated  a  har¬ 
monic  signal  sampled  at  64  points  per  cycle.  The  window 
length  used  is  10  cycles,  and  the  time  delay  is  r  =  16.  To 
train  and  test  the  detector  we  used  400  non-overlapping  ob¬ 
servation  windows.  The  detector  is  based  on  the  features  ai 
and  a2  calculated  with  the  correlation  method  presented  in 
Section  2.  Fig.  1  shows  the  receiver  operating  characteristic 
(ROC)  curve  for  both  the  dynamical  and  a  full  bandwidth 
energy  detector  for  a  false  alarm  probability  of  P/a  =  0.1. 
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Figure  2:  SNR  at  Pd  =  0.5  as  a  function  of  the  sampling 
rate  for  the  Rossler  signal  for  the  dynamical  detector  (o) 
and  energy  detector  (*).  The  sampling  rate  is  expressed  in 
units  of  points  per  characteristic  cycle.  The  slope  of  the 
dashed  line  has  a  slope  of  -1.5  dB  per  factor  of  2  increase 
in  sampling  rate.  The  solid  line  has  a  slope  of -3.0  dB  per 
factor  of  2  increase  in  sampling  rate. 


We  note  that  in  addition  to  the  harmonic  signal,  an  ex¬ 
ponentially  damped  harmonic  can  also  be  represented  ex¬ 
actly  by  the  linear  DDE  model  Eq.  (5).  In  this  way  one  can 
study  the  detection  of  transient  pulses  in  the  same  frame¬ 
work. 

The  detection  performance  shown  in  Fig.  1  can  be  im¬ 
proved  by  increasing  the  data  sampling  rate,  which  holds 
true  over  a  very  broad  range  of  signal  classes  [1],  This  effect 
is  tied  to  model  specification,  due  to  the  estimate  of  the  sig¬ 
nal  derivative.  To  demonstrate  this  we  computed  the  ROC 
curves  for  increasing  sampling  rates  for  the  x-component 
of  the  broadband,  nonlinear  Rossler  signal  [1],  The  model 
coefficients  are  computed  using  the  correlation  method  and 
a  quadratic  single-delay  DDE  [1]  [2] . 

In  Fig.  2  we  plot  the  SNR  corresponding  to  Pd  =  0.5 
and  Pfa  =  0.1  as  a  function  of  the  logarithm  of  the  sampling 
rate  of  the  Rossler  signal  for  both  the  dynamical  and  the 
energy  detector.  We  can  observe  a  linear  dependence  for 
both  detectors,  however  the  slopes  of  these  curves  are  not 
the  same.  The  difference  in  slope  accounts  for  a  relative 
gain  of  the  dynamical  detector  over  the  energy  detector  of 
about  1.5  dB  per  doubling  of  sampling  rate,  which  is  close 
to  the  theoretical  gain  expected  from  a  matched  filter. 

These  experiments  show  that  the  dynamical  models  can 
attain  performance  scaling  close  to  that  of  a  matched  filter, 
while  not  requiring  the  original  signal  template.  We  postu¬ 
late  that  this  property  is  derived  from  the  preservation  of 
nonlinear  phase  relationships  by  the  dynamical  model. 


Time  (sec) 


Figure  3:  Typical  seismic  recording  from  an  earthquake 
used  in  our  data  analysis.  The  data  can  be  divided  into 
three  parts  of  interest,  1)  preceding  noise,  2)  the  P-wave, 
and  3)  the  S-wave 


4.  CLASSIFICATION  OF  SEISMIC  DATA 

Man-made  explosions  around  the  pre-stressed  area  of  the 
Panama  Canal  are  similar  in  their  seismic  energy  content 
to  shallow  earthquakes,  making  conventional  discrimination 
methods  difficult  to  use.  The  estimation  of  many  seismo- 
logical  quantities,  such  as  seismic  hazard,  seismicity,  and 
energy  release  distribution  is  impossible  with  a  data  base 
polluted  by  man-made  explosions.  Therefore,  it  is  impor¬ 
tant  to  discriminate  between  explosions  and  earthquakes 
on  a  routine  basis.  The  mechanical  properties  of  the  rocks 
that  seismic  waves  propagate  through  quickly  organize  the 
waves  into  two  types.  These  are  compressional  waves,  also 
known  as  primary  or  P-waves,  which  travel  quickly,  and 
shear  waves,  also  known  as  secondary  or  S-waves,  which 
travel  usually  at  60  to  70  percent  of  the  speed  of  the  P- 
waves.  Examples  of  earthquake  and  explosion  time  series 
are  shown  in  Fig.  3  and  4,  respectively. 

4.1.  The  Data  Set 

The  library  of  data  consists  of  20  seismic  events  (12  earth¬ 
quakes  and  8  explosions),  recorded  with  3-axis  sensors  in 
north,  east,  and  vertical-  direction.  The  data  can  be  divided 
into  three  parts  of  interest  (see  Fig.  3),  namely  preceeding 
noise,  the  P-wave,  and  the  S-wave.  The  sampling  rate  is  40 
Hz. 


4.2.  Master  Event  Correlations 

First,  we  give  a  brief  description  of  the  master  event  cor¬ 
relation  analysis  carried  out  by  Persson  and  Boutet  [8], 
in  order  to  compare  our  results  with  theirs.  The  analy¬ 
sis  is  performed  by  using  a  library  of  known  events  to  se¬ 
lect  unknown  events  using  second,  third  and  fourth-order 
cross-correlation  functions.  The  functions  are  all  non  redun¬ 
dant  cross-correlation  combinations  for  each  data  window 
between  the  library  events  and  the  unknown  events.  The 
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Figure  4:  Typical  seismic  recording  from  an  explosion  used 
in  our  data  analysis. 

data  windows  are  fitted  to  the  phase  of  interest  (i.e.  noise, 
P-wave,  or  S-wave),  and  the  lag  r  is  chosen  corresponding 
to  the  maximum  correlation  value.  The  second  order  cross¬ 
correlation  between  the  unknown  event  x(t)  and  one  of  the 
library  events  y(t)  is  defined  as, 

.  N+a 

mxy(T)  =  —  ]T  x{t)y{t  +  r)  (6) 

n—ct 

where  N  and  a  have  to  be  matched  to  the  phase  of  interest. 
The  third  order  cross-correlation  is  defined  as, 

i 

mXXy(n,T2)  =  ^2  x(t)x{t  +  n)y(t  +  r2)  (7) 

n=a 

The  fourth  order  cross-correlation  is  defined  as, 

Wlxxxyir — 

1  (8) 
—  22  X(t)X(t  +  Ti)x(t  +  T2)y{t  +  T3) 

n— a 

The  classification  procedure  is  as  follows:  calculate  the  two 
master  clusters  composed  of  library  events  only.  To  classify 
an  unknown  event  all  cross-correlations  in  Eq.  (6)  to  Eq.  (8) 
(including  correlations  like  mxyy,  mxxyy,  and  mXyVy)  are 
estimated  for  all  the  library  events  y(t)  and  the  unknown 
event  x{t),  which  forms  the  unknown  cluster.  The  next 
step  is  to  calculate  the  squared  Mahalanobis-distance  be¬ 
tween  the  unknown  cluster  and  the  two  master  clusters. 
The  shortest  squared  Mahalanobis-distance  classifies  the 
unknown  event  as  an  explosion  or  an  earthquake.  If  the 
clusters  overlap  by  more  than  50  percent,  the  method  is 
considered  to  have  failed  and  the  cross  correlations  are  not 
used  in  the  analysis. 

To  evaluate  the  classification  performance,  each  of  the 
master  events  are  tested  against  the  library  events.  The 
second  order  method  discriminates  40  percent  of  the  library 
events  correctly,  while  the  third  and  fourth  order  methods 
succeed  for  75  percent  and  80  percent  of  the  library  events 
respectively. 


Figure  5:  Parameter  distribution  for  the  library  events,  cal¬ 
culated  from  the  P-wave  in  the  north-direction.  Note  that 
in  the  two  figures  containing  a 2  the  two  classes  are  lin¬ 
early  separable.  The  processing  parameters  are  i  =  25  and 

3  =  19. 


4.3.  Nonlinear  Dynamical  Models 

We  now  describe  classification  of  the  seismic  data  utilizing 
the  two-delay  second  order  DDE  model  given  by  Eq.  (2). 
The  three  parameters  ai,  02,  and  03  are  estimated  with 
Eq.  (3).  To  solve  Eq.  (3)  all  the  moments  in  the  matrix 
equation  have  to  be  estimated,  and  we  use  an  unbiased 
estimate  defined  as 

{ xa(n)xb{n  —  i)xc(n  —  j)  )  = 

1  N~1  (9) 

-  Y'  xa  (n)xb  (n  -  i)xc  (n  -  j) 

N  —  m  ' 

n=m 

where  m  is  equal  to  the  largest  of  i  and  j\  N  is  the  window 
length  and  i  and  j  are  the  discrete  delays  corresponding 
to  Ti  and  T2  respectively;  the  powers  a,  b  and  c  are  set 
to  0,  1  or  2  corresponding  to  the  moment  that  has  to  be 
calculate.  We  use  the  window  length  N  =  128  samples, 
and  the  two  delays  i  and  j  are  set  to  25  and  19  respectively, 
which  give  the  largest  possible  03  coefficient.  The  feature 
space  spanned  by  the  three  model  coefficients  is  shown  in 
Fig.  5.  One  can  observe  that  the  explosions  and  earthquakes 
are  linearly  separable.  Moreover,  the  signal  separation  is 
significant  from  noise  signals  as  well  (not  shown  here). 

To  achieve  discrimination  between  the  earthquake  and 
explosion  recordings,  we  develop  classifiers  based  on  the 
above  theory  which  successfully  separates  earthquake  and 
explosion  time  series.  For  a  quantitative  analysis  of  the  clas¬ 
sification  performance  we  apply  both  Mahalanobis-distance 
decision  criteria  and  a  neural  network  algorithm  to  discrim¬ 
inate  between  the  library  events  based  on  their  model  rep¬ 
resentation  in  the  feature  space.  In  order  to  compare  the 
dynamical  model  approach  to  the  method  of  master-event 
correlation  we  implement  a  Mahalanobis-distance  classifier 
of  the  same  type  as  the  classifier  used  by  Persson  and  Boutet 
[8].  The  classification  performance  achieved  with  the  classi- 
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Direction 

j  North 

Vertical 

East 

Wave 

s 

P 

s 

P 

s 

P 

Correct 

Earthquake 

Classification 

75 

92 

67 

67 

92 

83 

Correct 

Explosion 

Classification 

75 

100 

88 

88 

75 

75 

Table  1:  Classification  of  the  library  events  with  the  dy¬ 
namical  model  combined  with  Mahalanobis-distance  deci¬ 
sion  criteria.  All  values  in  the  table  are  given  in  percent. 


Direction 

|  North 

Vertical 

East 

Wave 

s 

P 

s 

P 

s 

P 

Correct 

Earthquake 

Classification 

69 

88 

91 

88 

100 

96 

Correct 

Explosion 

Classification 

78 

98 

75 

70 

86 

74 

Table  2:  Classification  of  the  library  events  with  the  dy¬ 
namical  model  combined  with  a  LVQ  neural  net.  All  values 
in  the  table  are  given  in  percent. 


fier  based  on  the  dynamical  model  combined  with  Mahala¬ 
nobis-distance  decision  criteria  for  various  wave  types  (S  or 
P)  and  directions  (North,  East,  Vertical)  are  presented  in 
Table  1.  The  best  overall  result,  96  percent  correct  classi¬ 
fication,  is  achieved  for  the  P-wave  in  the  north-direction. 
Next  we  build  a  neural  net  classifier  where  the  model  pa¬ 
rameters  are  used  as  input  to  a  learning  vector  quantization 
(LVQ)  neural  net.  LVQ  is  a  method  for  training  competi¬ 
tive  layers  in  a  supervised  manner.  To  evaluate  the  classifi¬ 
cation  ability  we  train  the  LVQ  neural  net  to  discriminate 
between  earthquakes  and  explosions  on  all  but  one  of  the  li¬ 
brary  events,  and  then  use  the  trained  neural  net  to  declare 
the  removed  event  as  an  explosion  or  an  earthquake.  This  is 
performed  for  all  library  events,  in  all  possible  combinations 
in  terms  of  phase  and  direction.  The  classification  ability  of 
a  LVQ  neural  net  can  be  slightly  different  between  trainings, 
even  if  the  training  is  repeated  on  the  exact  same  data  set. 
In  order  to  reduce  these  type  of  fluctuations  the  training 
and  classification  procedure  is  repeated  10  times  and  then 
the  average  classification  performance  is  calculated.  The  re¬ 
sults  for  various  wave  types  (S  or  P)  and  directions  (North, 
East,  Vertical)  are  summarized  in  Table  2.  The  best  overall 
result,  93  percent  correct  classification,  is  achieved  for  the 
P-wave  in  the  north-direction. 

5.  CONCLUSIONS 

We  have  developed  a  method  for  the  estimation  of  DDE 
models  motivated  by  the  Yule- Walker  equations,  which  pro¬ 
vides  computational  speed,  numerical  stability  and  noise 
robustness.  With  numerical  experiments  we  showed  that  a 


detector  based  on  these  dynamical  models  can  achieve  scal¬ 
ing  close  to  that  of  a  matched  filter,  without  requiring  the 
original  signal  template. 

Nonlinear  dynamical  signal  models  can  be  utilized  for 
detection  and  classification  of  a  wide  range  of  signals.  In 
this  work  we  analyzed  real-world  seismic  signals  derived 
from  seismic  station  recordings  in  the  region  of  the  Panama 
Canal,  which  are  of  a  transient  nature.  We  implemented 
two  classifiers  based  on  the  two-delay  second  order  DDE 
model  given  by  Eq.  (2).  The  classifier  that  uses  the  Mahala¬ 
nobis-distance  decision  criteria  results  in  96  percent  correct 
classification  of  the  library  events.  The  classifier  built  with 
a  LVQ  neural  net  results  in  93  percent  correct  classification. 

We  compare  our  results  with  a  previous  classification 
method  used  on  the  same  seismic  recording  database,  which 
used  time-domain  master-event  correlations  of  second,  third, 
and  fourth  order.  The  best  classification  performance  achiev¬ 
ed  with  the  master-event  correlations  is  80  percent  for  the 
fourth  order.  Hence,  we  find  improved  performance  over 
even  the  highest-order  master-event  correlations  method, 
which  indicates  that  a  dynamical  model  approach  incorpo¬ 
rates  additional  signal  information  on  this  example.  In  sum¬ 
mary,  this  example  is  important  because  it  indicates  that 
dynamical  modeling  and  classification  methods  can  add  ad¬ 
ditional  performance  gain  in  a  real-world  setting. 

REFERENCES 

[1]  Kadtke  J.,  Pentek  A.,  Automated  signal  classification 
using  dynamical  signal  models  and  generalized  higher- 
order  data  correlations  (U),  USN  Journal  of  Underwater 
Acoustics,  in  press  (2000). 

[2]  Kadtke  J.,  Kremliovsky  M.,  Estimating  dynamical  mod¬ 
els  using  generalized  moment  functions,  Physics  Letters 
A  260,  203  (1999). 

[3]  Hale,  J.K.,  Lunel,  S.M.V.,  Introduction  to  functional 
differential  equations,  Springer- Verlag,  1993. 

[4]  Voss,  H.,  Kurths,J.,  Reconstruction  of  non-linear  time 
delay  models  from  data  by  the  use  of  optimal  transfor¬ 
mations,  Physics  Letters  A  234,  336-44,  1997. 

[5]  Boashash,  B.,  Edward  J.P.,  Abdelhak,  M.Z.,  eds., 
Higher-order  statistical  signal  processing,  Longman  and 
Wiley  Press,  Melbourne,  1995. 

[6]  Kremliovsky,  M.,  Kadtke,  J.,  Inchiosa,  M.,  Moore,  P., 
Characterization  of  dolphin  acoustic  echo-location  data 
using  a  dynamical  classification  method,  International 
Journal  of  Bifurcation  and  Chaos  8,  813-23,  1998. 

[7]  Lennartsson  R.  K.,  Classification  with  dynamical  mod¬ 
els  estimated  with  higher  order  statistical  moments, 
FOA  Report,  FOA-R-99-01292-313-SE,  National  De¬ 
fence  Research  Establishment,  Sweden,  November  1999. 

[8]  Persson  L.,  Boutet,  J.,  Discrimination  of  local  seismic 
events  in  panama  by  means  of  higher-order  statistics, 
Proceedings  of  IEEE  Signal  Processing  Workshop  on 
Higher-Order  Statistics,  Banff,  Alberta,  Canada,  July 
21-23,  1997,  pp.  14-19. 

[9]  Marple,  S.L.,  Digital  spectral  analysis,  Englewood  Cliffs, 
NJ,  Prentice  Hall,  1987. 


726 


THE  PERFORMANCE  OF  SPARSE  TIME-REVERSAL  MIRRORS  IN  THE 
CONTEXT  OF  UNDERWATER  COMMUNICATIONS 


Joao  Gomes  Victor  Barroso 


Institute)  Superior  Tecnico  -  Instituto  de  Sistemas  e  Robotica 

Av.  Rovisco  Pais,  1049-001  Lisboa,  Portugal 
{jpg,vab}0isr . ist .utl.pt 


ABSTRACT 

Recently,  wave  focusing  using  a  uniform  time-reversal 
array  has  been  demonstrated  in  the  ocean  with  very 
encouraging  results.  This  technique  may  be  used  to 
regenerate  a  mildly  distorted  signal  at  the  input  of  a 
digital  underwater  acoustic  receiver,  hence  reducing  its 
equalization  requirements  at  the  expense  of  additional 
complexity  at  the  transmitter.  This  work  investigates 
the  performance  improvements  that  become  possible 
when  sparse  and  non-uniform  arrays  are  used.  Re¬ 
sults  from  the  theory  of  randomly-spaced  arrays  are 
extended  to  a  simplified  ocean  waveguide,  revealing 
that  familiar  relations  between  the  sensor  placement 
density  function  and  the  directional  characteristics  of 
the  generated  acoustic  field  are  still  valid  in  the  large- 
scale.  Simulation  results  confirm  the  validity  of  these 
derivations. 

1.  INTRODUCTION 

Underwater  acoustic  propagation  is  a  waveguide  phe¬ 
nomenon  where  pressure  waves  are  repeatedly  reflected 
by  the  sea  surface  and  bottom,  and  undergo  time- vary¬ 
ing  refraction  and  scattering  by  inhomogeneities  in  the 
medium  [1].  When  acoustic  waves  are  used  to  transmit 
digitally  modulated  signals,  these  physical  processes  in¬ 
duce  temporal  spreading  of  the  signaling  waveforms, 
resulting  in  significant  intersymbol  interference  (ISI) 
upon  reception.  Reliable  decoding  of  (relatively)  high¬ 
speed  phase-coherent  modulations  under  such  condi¬ 
tions  requires  the  use  of  spatial  diversity  and  powerful, 
computationally  intensive,  equalization  algorithms  [2]. 

A  wave-focusing  approach  is  used  in  this  work  to 
mitigate  the  effects  of  ISI,  so  that  the  receiver  may  be 
simplified.  Focusing  waves  in  inhomogeneous  media  is 
a  difficult  problem  that  usually  requires  detailed  physi¬ 
cal  knowledge  of  the  environment.  Severe  performance 
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degradation  may  occur  if  the  assumed  propagation  con¬ 
ditions  do  not  match  the  actual  ones  to  a  high  degree 
of  accuracy  that  is  unattainable  under  most  realistic 
conditions  in  the  ocean.  As  an  alternative  to  open- 
loop  operation,  wave  focusing  through  phase  conjuga¬ 
tion  may  be  used  whenever  the  desired  focal  point  has 
the  ability  to  generate  energy,  either  by  active  means, 
or  by  reflection  of  incoming  waves. 

Phase  conjugation  in  underwater  acoustics  is  imple¬ 
mented  through  a  time-reversal  mirror  (TRM),  i.e.,  an 
array  of  transducers  that  record  sound,  store  it,  and 
later  reproduce  it  backwards  in  time  [3,  4].  The  gen¬ 
erated  waves  propagate  in  a  manner  reciprocal  to  the 
original  field,  such  that  energy  is  automatically  redi¬ 
rected  towards  the  focus  and  concentrated  there  even 
when  poorly  characterized  regions  are  crossed.  An  ap¬ 
plication  of  phase  conjugation  to  underwater  commu¬ 
nication  requires  the  receiver  to  first  transmit  the  ba¬ 
sic  waveforms  of  the  signal  constellation,  so  that  their 
distorted  replicas  are  stored  at  the  mirror.  These  are 
subsequently  used  to  modulate  a  message,  regenerating 
a  nearly  multipath-free  signal  with  the  desired  pulse 
shape  at  the  focus,  where  the  receiver  is  located  [5,  6]. 

In  [7]  it  was  shown  experimentally  that  a  time- 
reversal  mirror  may  still  perform  adequately  even  when 
the  sensors  are  spaced  by  about  ten  wavelengths.  More¬ 
over,  it  is  known  that  nonuniform  sensor  separation 
may  dramatically  improve  the  capacity  of  large  linear 
arrays  for  direction  of  arrival  estimation  [8,  9].  Moti¬ 
vated  by  these  results,  the  goal  of  the  present  paper  is 
to  study  sparsely-populated  mirrors  and  sensor  alloca¬ 
tion  strategies  that  make  efficient  use  of  the  available 
degrees  of  freedom. 

2.  DATA  MODEL 

For  the  sake  of  analytical  tractability,  the  ocean  waveg¬ 
uide  is  modeled  as  a  range-independent  cross-section 
with  depth  H  and  constant  sound  speed  c.  The  ocean 
surface  is  an  ideal  pressure-release  surface,  while  the 
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Surface 


bottom  is  rigid  and  lossy.  Their  (constant)  reflection 
coefficients  are  -1  and  0  <  Qb  <  1,  respectively. 

Prom  a  linear  systems  perspective,  the  normalized 
transfer  function  (Green’s  function)  at  frequency  u)  from 
range/depth  r'  =  (R1,  z')  to  point  r  is  obtained  by  solv¬ 
ing  the  wave  equation  for  a  time- harmonic  point  source 

[V2  +  fc2(r)]  Gw(r,  r')  =  —S(r  -  r') ,  (1) 

where  k(r)  =  w/c(r)  is  the  wavenumber.  If  the  medium 
is  bounded,  G w  must  satisfy  appropriate  boundary  con¬ 
ditions.  In  this  work  it  is  assumed  that  frequencies  of 
several  KHz  are  used,  in  which  case  ray  theory  pro¬ 
vides  adequate  modeling  of  the  acoustic  propagation. 
The  solution  of  (1)  may  then  be  approximated  by  a 
series  of  eigenray  contributions 

Gw( r,r')  «  £(-l)"*‘<#-‘  {di exp(-juTi)}  ,  (2) 

i 

where  ns,i,  ng,i  are  the  number  of  surface  and  bottom 
reflections  of  the  i-th  eigenray  linking  r'  and  r.  Acous¬ 
tic  rays  travel  in  straight  lines  under  the  assumed  isove¬ 
locity  conditions,  in  which  case  the  term  inside  {•}  in 
(2)  may  be  expressed  as  a  free-space  Green’s  function 
linking  r'  and  a  point  rb)  whose  distance  from  r'  equals 
the  length  of  the  i-th  eigenray 


n>  (r(i)  _  exp(-jfc|rW  -  r'|) 

4ir|r(*)  —  r'| 


(3) 


Reciprocity  of  the  medium  allows  the  spatial  arguments 
of  Gw(-,  •)  and  G^(-,  •)  to  be  interchanged. 

Propagation  from  a  source  to  each  transducer  of  the 
TRM  and  back  generates  an  intricate  pattern  of  eigen- 
rays.  Under  homogeneous  conditions,  decomposition  of 
(2)  as  a  weighted  sum  of  free-space  terms  leads  to  the 
image  method  [1],  where  points  in  the  water  column  are 
expanded  into  a  series  of  surface-  and  bottom-reflected 
images.  The  original  waveguide  problem  is  thus  trans¬ 
formed  into  one  of  (coupled)  free-space  propagation  be¬ 
tween  the  source  and  the  image  arrays  (fig.  1),  which 
provides  more  insight  into  the  operation  of  the  TRM. 

Let  ra  =  (Ra,  za)  be  a  convenient  reference  point 
for  the  array.  Henceforth,  unless  otherwise  noted,  dis¬ 
placements  will  be  relative  to  ra.  Additional  coordinate 
systems  will  be  placed  at  all  (p,  m)  images  of  ra  (see  fig. 
1,)  and  their  z  axis  oriented  so  that  the  coordinates  of 
the  associated  virtual  sensors  are  independent  of  ( p ,  m). 
When  expressed  in  frame  (p,  m),  the  displacement  to 
an  arbitrary  field  point  r  is  denoted  by  It  is  then 

possible  to  write  (2)  in  vector  form,  discarding  contri¬ 
butions  from  rays  that  suffer  more  than  Ng  bottom 
reflections 


Gu{  r,r') 


a 

H 

'  G>J°\r,v')  ' 

-a 

.  G^(1)(r,r') 

(4) 


Array 


(a) 


Figure  1:  Field  computation  by  the  image  method  (a) 
Waveguide  propagation  (b)  Image  expansion 
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3.  TIME-REVERSAL  MIRROR 

Time  reversal  of  a  real  signal  is  equivalent  to  conju¬ 
gation  of  its  Fourier  transform.  Then,  the  normalized 
field  produced  by  a  TRM  due  to  a  point  source  at  rs  is 
obtained  by  summing  the  sensor  contributions  as  fol¬ 
lows 

■fuj(r,  rs)  'y  1Gu{Trn,rs')Gu{v,Tm) 

m 

=  a!,[BM(rlr1)  +  B(1'1)(rtrJ) 

-2Re{B(,°'1)(r,rs)}]a,  (5) 

where,  by  reciprocity,  the  beamforming  matrices  Bu 
are  given  by 

BSM)(r,rs)  =  '£G'Jp)(rm,T)G'J<l)H(rm,rs) .  (6) 

m 

Now  the  source  is  assumed  to  be  in  the  far-field  of  each 
array  image,  so  that  a  plane  wave  approximation  for 
the  free-space  Green’s  function  (3)  may  be  used 

GL(r>r')  «  ~~XPj exp(jfc(r',er))  ,  |r'|  <  |r| , 
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where  r  =  |r|,  er  =  r/r,  and  (•,  •}  denotes  the  inner 
product  of  two  vectors.  Each  element  of  the  beam¬ 
forming  matrices  (6)  has  the  form 


r(P><7) 
w  m,n 

=  Cu(r{p’m),r^)Du(eiP'm) 

-«£»•»>) 

(7) 

Cu(r,  r') 

exp  (-jk(r  —  r')) 

(47r)2rr' 

(8) 

Du(e) 

=  ]Texp(jfc(rm,e)) . 

(9) 

m 


In  (9),  Du  is  recognized  as  an  array  directivity  func¬ 
tion.  When  e(p'm)  =  each  term  in  the  sum  is 

equal  to  unity,  and  the  contributions  from  all  elements 
add  in  phase  in  this  direction.  In  other  directions,  the 
contributions  are  not  in  phase,  and  the  field  is  smaller. 

In  (5),  both  BS0’^  and  the  off-diagonal  terms  of 
Bi°’0)  and  BLM)  account  for  the  influence  at  r(p,m)  of 
a  beampattern  steered  toward  ri9’n  .  Field  calculations 
then  require  evaluating  the  influence  of  each  virtual  ar¬ 
ray  on  all  images  of  the  target  ocean  section.  The  total 
field  Pu  is  obtained  by  a  (weighted)  sum  over  array 
images  and  target  images.  If  the  aperture  and  sensor 
density  are  large  enough  so  that  D u  is  narrow  and  has 
a  single  main  lobe,  image  arrays  only  contribute  sig¬ 
nificantly  in  the  main  (0,0)  cross-section.  In  that  case, 
the  mirror  operates  in  retrodirective  mode  by  sending 
acoustic  beams  in  the  same  directions  where  it  receives 
energy  from  the  source  dining  the  first  transmission. 
The  large-scale  shape  of  the  focal  region  is  mostly  de¬ 
termined  by  the  beampattern  D w,  while  the  fine-scale 
structure  results  from  the  interference  of  beams,  and  is 
heavily  influenced  by  Cu- 

4.  RANDOMLY-SPACED  SENSORS 

The  theory  of  randomly-spaced  arrays  in  free  space 
shows  that  the  beamwidth  depends  mainly  on  the  aper¬ 
ture  dimension,  while  the  directive  gain  and  sidelobe 
level  are  directly  related  to  the  number  of  elements 
used  if  the  average  spacing  is  large  [8].  These  results 
will  now  be  extended  to  the  case  were  multiple  array 
images  exist. 

The  array  is  formed  by  M  vertically-placed  sen¬ 
sors  at  rm  =  (0,  zm)  relative  to  the  reference  point 
ra.  The  depths  {zm}  are  assumed  to  be  i.i.d.  random 
variables  with  a  common  probability  density  function 
that  is  greater  than  zero  only  in  an  interval  of  length 
L.  According  to  (9),  the  radiation  pattern  is 

M 

D(u)  =  Y  exp (juXm) ,  (10) 

m— 1 


where  xm  =  is  the  normalized  sensor  depth  and 
u  =  fcf  ((0,  l),e)  may  be  interpreted  as  a  scaled  di¬ 
rection  cosine.  The  pdf  of  x  will  be  denoted  by  g(x), 
with  characteristic  function  <f>(u)  =  exp (jux).  In  the 
assumed  array  vertical  geometry  the  argument  u  al¬ 
ready  incorporates  the  acoustic  frequency  through  the 
wavenumber  k,  hence  the  explicit  dependence  of  D  on 
uj  is  dropped  in  (10). 

It  follows  from  the  definition  of  the  characteristic 
function  that  the  mean  beampattern  in  (10)  is 

M 

E{D(u)}  =  Y  E{exp(jux)}  =  M(f>(u) .  (11) 

m— 1 

Since  the  phase-conjugated  field  (5)  only  depends  on 
the  sensor  positions  through  D,  it  is  clear  that  the 
mean  acoustic  pressure  -Pw(r,  rs)  is  given  by  a  similar 
expression,  with  D  replaced  by  M<p  in  the  beamforming 
matrices,  as  shown  by 

B<?£>n  =  CUr(p’m),r<«’">)  •  M4>{u)  (12) 

u  =  ^(sintf^-sin^),  (13) 

where  oiP’m^  and  0<9'n)  are  the  bearing  angles  to  the 
field  and  source  images,  respectively.  As  in  the  free- 
space  case,  the  mean  field  is  identical  to  the  one  that 
would  be  created  by  a  continuous  aperture  with  exci¬ 
tation  g{x). 

The  variance  of  the  time-reversed  field  is  denoted 
by  cr2(r,  r.)  =  E{\Pu(r,  rs)  -  Pw( r,  r,)|2},  and  involves 
evaluating  the  sum 

a2(r,r,)  =  Y  (-l)iP~q)~{v~v)<*iai<*rna„ 

i,l,m,n}p,q,u}v 

x£{ABWnAB'“f} 

A  r(p.<?)  —  R(P.9)  _ 

L-*iJwm,n  u)  m,n 

=  Cu{r^m\r^){D{u)  -  M<t>{u)) ,  (14) 

where  the  argument  u  is  defined  in  (13).  Given  the 
i.i.d.  assumption  on  sensor  positions,  the  free-space  co- 
variance  function  satisfies  [8] 

E{(D(u)  -  M<j>{u)){D{ v)  -  M4>(v))*}  = 

M(<f)(u  —  v)  -  (t>{u)(f>* (v))  «  M(f>(u  -  v) ,  (15) 

for  sufficiently  large  u,  v.  Using  (8),  (14)  and  (15), 
the  terms  of  cr2(r,  rs)  are  seen  to  depend  mostly  on 
argument  differences 

E{AB<™]nAB(^*}  = 

exp (-jfc((r(p’TO)  -  rft,w))  -  (r^Y  ~ 

(47r)4r(p’m)  rs9,7lV(w’*) 
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Figure  2:  ULVA  performance 

x  M<t>{k^{  (sin 6ip'm)  -  sin^’n))-  (16) 

(sin^u’^  —  sin  oiv'1^) )) . 

More  importantly,  both  (16)  and  <r2(r,rs)  depend  on 
the  number  of  array  sensors  only  through  the  linear 
term  M.  This  shows  that,  for  a  given  physical  configu¬ 
ration  and  placement  pdf,  the  variance  of  the  normal¬ 
ized  beampattern  decreases  to  zero  with  as  in 
free-space.  For  large  M,  individual  mirror  responses 
will  therefore  be  close  to  the  average  pressure  with 
high  probability. 

5.  SIMULATION  RESULTS 

The  parameters  that  were  used  in  the  simulations  are 
listed  in  table  1.  Both  uniform  and  square-cosine  densi¬ 
ties  were  considered  as  sensor  placement  strategies  [8] , 
with  nonzero  support  in  the  depth  interval  [1,99]  m. 
The  expressions  for  their  densities  and  characteristic 
functions  are  shown  in  table  2.  Figure  2  shows  the  ex¬ 
pected  mirror  performance  evaluated  using  (5)  for  uni¬ 
form  linear  vertical  arrays  (ULVA)  with  M  =  5,  10,  15, 
20,  30,  50,  70,  and  100  sensors,  evenly-spaced  between 
1  m  and  99  m.  The  focusing  effect  is  still  clearly  visible 
when  50  sensors  are  used  (5.3A  spacing,)  but  becomes 
severely  degraded  for  lower  values  of  M.  Figure  3  shows' 
the  average  time-reversed  acoustic  field  for  the  densities 
of  table  2,  evaluated  using  (5)  with  asymptotic  beam¬ 
forming  matrices  (12).  The  corresponding  free-space 
responses  <p(u)  are  also  superimposed  on  these  plots. 
Several  curves  are  shown  for  different  values  of  M  to 
simplify  the  comparison  with  Monte  Carlo  simulations, 


Table  1:  Simulation  parameters 


Bottom  depth 

77  =  100  m 

Source  range 

R  =  1500  m 

Source  depth 

za  =  80  m 

Frequency 

/  =  4  KHz 

Sound  speed 

c  =  1500  ms-1 

Bottom  reflectivity 

cub  =  0.3 

Image  truncation  limit 

Nb  =  10 

-150 1 - » - 1 - i - i - i - 1 - 1 - 1 _ i _ I 

0  10  20  30  40  50  $0  70  80  90  100 

z(m) 


(a) 


z<m) 


Figure  3:  Expected  field  with  random  spacing  (a)  Uni¬ 
form  pdf  (b)  Square-cosine  pdf 


although  it  is  clear  from  the  previous  discussion  that 
M  only  introduces  a  gain  in  the  mean  field.  These  re¬ 
sults  confirm  that  the  large-scale  evolution  of  the  field 
is  determined  by  cf>(u),  although  the  detailed  behav¬ 
ior  depends  on  the  interference  pattern  between  array 
images.  In  particular,  the  acoustic  field  between  the 
pressure  nulls  at  77  m  and  83  m  is  almost  identical  in 
figures  2  and  3. 

The  time-reversed  field  of  figure  3b  seems  to  be 
more  suitable  for  coherent  communication  applications, 
since  it  creates  a  broader  region  of  high  acoustic  en¬ 
ergy  around  the  focus.  As  shown  in  [5],  the  extent  of 
the  low  ISI  zone  may  be  estimated  by  considering  the 
joint  evolution  of  the  acoustic  field  for  the  higher  and 
lower  frequencies  in  the  PAM  signaling  pulses.  From 
that  perspective,  concentrating  energy  in  a  broad  main 
lobe  maximizes  the  region  where  the  monochromatic 
field  components  within  the  signal  bandwidth  behave 
coherently,  leading  to  low  spectral  pulse  distortion  and 
mild  ISI. 

Figure  4  shows  the  average  acoustic  fields  that  were 
obtained  for  the  previously  considered  values  of  M  in 
500  Monte  Carlo  simulations.  The  results  are  in  good 
agreement  with  the  theoretical  mean  values  of  figure  3, 
even  for  the  lowest  values  of  M.  The  difference  in  resid¬ 
ual  sidelobe  level  may  be  attributed  to  model  discrep¬ 
ancies,  since  the  ideal  responses  of  figure  3  were  based 


Table  2:  Densities  and  characteristic  functions 
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Figure  4:  Average  field  in  Monte  Carlo  simulations  (a) 
Uniform  pdf  (b)  Square-cosine  pdf 


Figure  5:  Average  field  evolution  with  the  number  of 
sensors  (a)  Uniform  pdf  (b)  Square-cosine  pdf 

on  a  plane  wave  approximation.  Naturally,  individual 
time-reversed  fields  vary  considerably,  especially  when 
few  sensors  are  used,  but  beampatterns  with  globally 
desirable  features  are  obtained  with  reasonably  high 
probability  for  M  >  30.  Closer  examination  of  the 
field  covariance  is  deferred  to  future  work. 

The  same  results  of  figure  4  are  represented  in  figure 
5  using  a  linear  scale,  showing  that  the  mean  field  does 
indeed  increase  linearly  with  M. 

6.  CONCLUSION 

Non-uniform  sensor  placement  strategies  for  linear  time- 
reversal  arrays  were  investigated  as  a  means  of  (i)  re¬ 
ducing  the  required  number  of  elements  relative  to  uni¬ 
form  geometries,  and  (ii)  assessing  the  sensitivity  of 
focusing  power  to  sensor  locations. 

Using  a  ray  propagation  model,  results  from  the 
theory  of  antenna  arrays  with  randomly- spaced  ele¬ 
ments  in  free  space  were  extended  to  the  ocean  waveg¬ 
uide,  that  is  crucial  for  effective  operation  of  a  time- 
reversal  mirror.  Simulations  have  confirmed  the  va¬ 
lidity  of  theoretical  predictions  for  two  distinct  sensor 
placement  distributions.  The  results  indicate  that  fo¬ 
cusing  performance  is  affected  mainly  by  the  total  array 


length,  rather  than  inter-sensor  separation  or  precise 
sensor  locations.  Good  results  are  obtained  even  when 
the  average  spacing  of  sensors  is  significantly  larger 
than  half  a  wavelength,  since  grating  lobes  are  not  co¬ 
herently  combined.  These  conclusions  are  supported 
by  the  work  of  other  authors  using  uniform  arrays  [10]. 

While  these  preliminary  studies  indicate  that  some 
reduction  in  the  number  of  sensors  is  possible  relative 
to  uniform  arrays  without  incurring  significant  perfor¬ 
mance  losses,  practical  considerations  prevent  random¬ 
ly-spaced  mirrors  from  attaining  the  spectacular  sav¬ 
ings  envisaged  in  [8]  for  arrays  with  several  thousand 
elements. 
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ABSTRACT 

A  vector  hydrophone  is  composed  of  two  or  three  spatially 
collocated  but  orthogonally  oriented  velocity  hydrophones 
plus  an  optional  collocated  pressure  hydrophone.  A  vec¬ 
tor  hydrophone  may  form  azimuth-elevation  beams  that  are 
frequency  invariant,  bandwidth  invariant  and  same  for  the 
near  field  as  for  the  far  field.  This  paper  characterizes  the 
maximum-SINR  beam  pattern  and  the  matched-filter  beam 
pattern  associated  with  a  single  underwater  acoustic  vector 
hydrophone. 


1.  VECTOR  HYDROPHONE 

A  vector  hydrophone  consists  of  two  or  three  orthogonally 
oriented  velocity  hydrophones  plus  an  optional  pressure  hy¬ 
drophone,  all  spatially  collocated  in  a  point-like  geome¬ 
try.  Each  velocity  hydrophone  has  intrinsic  directional  re¬ 
sponse  to  the  incident  underwater  acoustic  particle  veloc¬ 
ity  wavefield,  measuring  one  Cartesian  component  of  the 
three-dimensional  particle  velocity  vector  of  the  incident 
wavefield.  On  the  other  hand,  a  pressure  hydrophone  mea¬ 
sures  the  acoustical  pressure  as  a  scalar  entity.  A  single 
vector  hydrophone  thus  has  an  intrinsic  two-dimensional 
azimuth-elevation  directivity  that  is  independent  of  signal 
frequency,  signal  bandwidth,  and  the  source’s  location  in 
the  near  field  as  opposed  to  the  far  field.  In  contrast,  the 
directivity  obtainable  from  an  array  of  spatially  displaced 
pressure  hydrophones  is  based  on  the  frequency  dependent 
inter-hydrophone  spatial  phase  factor;  and  the  beam  pat¬ 
tern  consequentially  depends  on  the  signal  frequency  and 
the  signal  bandwidth. 

Velocity  hydrophone  technology  has  been  used  in  under¬ 
water  acoustics  for  some  time  [1]  and  currently  attracts 
re-invigorated  attention  [18].  Many  different  types  of  vec¬ 
tor  hydrophones  have  been  implemented  (see  the  references 
cited  in  [13]).  The  Swallow  floats  [7],  a  freely  drifting  array 
of  vector  hydrophones,  are  neutrally  buoyant  and  may  be 
ballasted  to  any  desired  depth  in  the  ocean.  The  DIFAR 
array  [9]  is  a  uniform  vertical  array  with  acoustic  band  limit 
of  270Hz  for  linearly  constrained  minimum  variance  beam- 
forming  with  given  angles  and  with  flux  gate  compasses 
to  measure  the  orientation  of  the  horizontal  velocity  hy¬ 
drophones.  D’Spain,  Hodgkiss  &  Edmonds  [8]  develop  a 
vector  hydrophone  for  infrasonic  frequencies  from  1  to  20 
Hz. 

Nehorai  &  Paldi  [13]  first  develop  the  measurement  model 
of  the  vector  hydrophone  and  introduced  it  to  the  signal 
processing  research  community.  The  use  of  the  vector  hy¬ 
drophone  in  sensor  array  direction  finding  has  been  inves- 

’  This  research  work  was  supported  by  the  Hong  Kong  Re¬ 
search  Grant  Council’s  Mainline  Research  Grant  no.  44M5010 
and  Direct  Grant  no.  2050187. 


tigated  in  [13-21,  25,  27].  Vector- hydrophone  Capon  spec¬ 
trum  estimation  along  pre-determined  spatial  direction  has 
been  investigated  by  D’Spain,  Hodgkiss  &  etc.  [9]  and 
Hawkes  &  Nehorai  [23] ;  and  a  few  very  compact  expres¬ 
sions  of  vector  hydrophone  matched-filter  beam  pattern 
have  been  derived  in  [23].  However,  no  detailed  analysis 
is  therein  provided;  and  vector  hydrophone  beam  pattern 
analysis  remains  largely  overlooked  in  the  open  literature. 

The  present  work  characterizes  and  contrasts  in  de¬ 
tail  the  maximum  signal-to-interference-and-noise  (SINR) 
beam  patterns  and  the  matched-filter  beam  patterns  for 
each  of  the  following  possible  vector  hydrophone  construc¬ 
tions. 


Construction  #1:  Three  orthogonally  oriented  velocity 
hydrophones  plus  a  pressure  hydrophone,  in  spatial  collo¬ 
cation,  give  a  4  x  1  array  manifold  [13] : 


afc  =  a 


(3+1) 

k 


def 


sin  9k  cos  <f>k 

u(9k,<f>k) 

sin  9k  sin  <j>k 

v(9k,<t>k) 

cos  6k 

w{6k) 

1 

.  1 

The  first,  second  and  third  component  above  correspond 
respectively  to  the  velocity-hydrophone  aligned  along  re¬ 
spectively  the  x-axis,  the  y-axis  and  the  z-axis;  these  first 
three  components  of  a*,  give  the  three  Cartesian  direction- 
cosines.  The  last  component  corresponds  to  the  pressure- 
hydrophone.  The  Frobenius  norm  of  the  first  three  com¬ 
ponents  of  any  source’s  array  manifold  always  equals  to 
unity,  regardless  of  source  parameters.  With  this  four- 
component  vector-hydrophone  construction,  sources  may  be 
located  to  either  side  of  the  array;  that  is,  6k  may  range 
from  0  <  Ok  <  7r  instead  of  0  <  Ok  <  7r/2.  The  pres¬ 
ence  of  the  pressure  hydrophone  helps  to  distinguish  be¬ 
tween  acoustic  compressions  and  dilations.  This  is  impor¬ 
tant  because  acoustic  particle  motion  sensors  (such  as  a 
velocity  hydrophone),  by  themselves,  suffer  a  180°  ambi¬ 
guity,  with  their  plane- wave  response  given  by  the  ’’figure 
8”  curve.  However,  the  addition  of  a  pressure  hydrophone 
breaks  this  ambiguity  because  a  hydrophone  distinguishes 
between  acoustical  compressions  and  dilations. 


Construction  #2:  Three  orthogonally  oriented  velocity 
hydrophones  give  a  3  x  1  array  manifold: 


a  h 


—  a 


(3+0) 

k 


def 


sin  9k  cos  4>k 
sin  6k  sin  <j>k 
cos  6  k 


(2) 


Construction  #3:  Two  orthogonally  and  horizontally 
oriented  velocity  hydrophones  plus  a  pressure  hydrophone 
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This  suffices  to  completely  characterize  the  underwater 
acoustical  velocity-field,  despite  the  absence  of  the  z- 
axis  velocity  hydrophone.  The  omission  of  the  vertical 
velocity-hydrophone  avoids  direct  measurement  of  the  ver¬ 
tical  component  of  the  underwater  acoustical  particle  mo¬ 
tion,  thereby  allowing  actual  ocean  acoustics  to  be  better 
modeled  as  rectilinear.  Because,  particle  motion  may  be 
circularly  and  elliptically  polarized  and  needs  not  be  recti¬ 
linear.  Even  if  the  source  initially  generates  a  single  plane 
wave,  the  multipath  propagation  properties  of  the  ocean 
environment  typically  lead  to  elliptically  polarized  particle 
motion.  That  there  exists  no  vertically  oriented  velocity  hy¬ 
drophone  means  that  non-rectilinear  motion  will  affect  the 
measured  data  minimally;  and  the  rectilinear  data  model 
will  better  fit  generally  non-rectilinear  ocean  acoustics.  An 
example  of  construction  #3  is  the  cardioid  [11]. 


Construction  #4:  Two  orthogonally  and  horizontally 
oriented  velocity  hydrophones  give  a  2  x  1  array  manifold: 

(2+0)  def  [  sin  6  k  COS  (pk  ] 

&k  afc  [  sin  sin</>fc  J  '  ' 

2.  VECTOR  HYDROPHONE  BEAMFORMING 

A  transmitting  sensor  array  beamformer  focuses  its  trans¬ 
mission  energy  towards  targeted  azimuth-elevation  angu¬ 
lar  sectors,  whereas  a  receiving  sensor  array  beamformer 
represents  a  spatial  filtering  operation  to  separate  the  de¬ 
sired  signals  from  interferences  and  noise  based  on  their 
different  arrival  angles.  A  receiving  beamformer  may 
be  classified  as  data  independent  or  statistically  optimal 
[6].  The  former  effects  an  a  priori  specified  spatial  an¬ 
gular  beamformer  response  independent  of  the  incoming 
data,  but  the  latter  adaptively  optimizes  certain  statis¬ 
tical  criterion  defined  with  respect  to  the  collected  data. 
A  matched-filter  beamformer  is  an  example  of  the  for¬ 
mer;  and  a  maximum-signal-to-interference-plus-noise-ratio 
(maximum-SINR)  beamformer  is  an  example  of  the  latter. 
To  facilitate  subsequent  discussion,  define  ip  as  the  angle 

between  the  two  steering  vectors  aj3+0)  and  af,3"' 0  (each 
of  which  contains  as  components  the  Cartesian  direction 
cosines, 


,  def 

cos  tp  = 


i<3+0))Ha(3+0) 


=  sin  9S  sin  9i  cos(d>3  —  (pi)  +  cos  03  cos  6, 


2.1.  Matched-Filter  Beamforming 

Matched-filter  beamforming  forms  a  data-independent 
beamforming  weight  vector  w mf  to  match  the  desired  sig¬ 
nal’s  steering  vector  a (9S,4>S)\  that  is,  w mf  =  a(0„, <p3)- 
With  the  array  nominally  pointing  towards  a (9a,(ps),  the 
beam  pattern  becomes 


hMF{0e,<ps,Q,4>)  =  a(0s,(ps)Ha  (9,(p)  (5) 


Matched-filter  beamforming  passes  a  desired  signal  arriv¬ 
ing  from  an  a  priori  known  spatial  angle  but  rejects  inter¬ 
ferences  or  noise  from  all  other  possible  angles.  A  narrow 
mainlobe  and  low  sidelobes  are  desirable  in  the  beamformer 
output. 


Construction  #1: 

bMF  =  b(*+F1)  =  (1+ cosiP) /2 

which  is  independent  of  ( 9a,cj)s )  and  (9i,<pi)  per  se,  except 
through  ip.  This  constitutes  a  rotational  invariance  in  the 
spherical  coordinate;  only  the  angular  separation  between 
the  desired  source’s  and  the  interference’s  arrival  angles,  not 

their  absolute  values,  affects  b^.1' . 

Construction  #2: 

b mf  =  b MF0)  W  =  cos  ip 

which  is  also  independent  of  ( 9s,(pa )  and  ( 9i,<pi )  per  se,  ex¬ 
cept  through  ip.  bS*  and  b^0)  are  plotted  in  Figure  1. 
The  former  has  a  single  peak  at  ip  —  0,  as  expected;  how¬ 
ever,  the  latter  suffers  a  pi-ambiguity  because  interferences 
may  come  through  at  a  spurious  spatial  peak  at  ip  =  ir. 

Construction  #3: 

,(2+1)  1  ,  cos  ip  —  COS  0S  COS  9 

b MF  —  oMF  =  -  2 

1  sin  9a  sin  9  cos (<pa  —  (p ) 

=  2  +  2 

Unlike  and  b£t0),  b^t1)  (plotted  in  Figure  2)  is 

a  function  of  6S  ana  0,  in  addition  to  ip.  The  exists  a 
rotational  invariance  only  with  respect  to  z-axis  (i.e.,  the 
absolute  values  of  the  desired  source’s  and  interference’s 
azimuth  angles  do  not  matter,  only  their  difference).  As 
6a,9  6  [0, 7r],  all  trigonometric  terms  above  are  positive; 

thus,  b^1)  has  a  minimum  spatial  response  equal  to  0.5. 
This  means  that  interference  may  pass  through  from  all 
angles  regardless  of  the  nominal  arrival  direction  towards 

which  b^1)  is  pointed.  Moreover,  cos {cp3  -  0)  implies 
the  existence  of  a  spurious  peak  and  thus  a  7r-ambiguity  in 

the  azimuth  angle.  These  properties  render  b^.^  a  most 
unattractive  matched-filter  beamformer. 


Construction  #4: 

b mf  =  b^0)  =  cos  ip  -  cos  6S  cos  0  =  sin  sin  0  cos((/>3  -  (pi) 


Again,  b^p0)  (plotted  in  Figure  3)  depends  on  6S  and  9  in 

addition  to  ip-,  and  b^0)  is  always  positive.  An  incident 
signal  with  ip  =  0  may  be  rejected,  when  9S  or  9  approaches 
0  or  7r.  Like,  b^1*,  b^0)  offers  little  elevation  maneuver¬ 


ability.  b^0)  may  be  a  useful  in  selective  interference  re¬ 
jection  only  if  all  sources  are  known  to  impinge  from  9  «  0. 
A  7r-ambiguity  also  exists;  it  arises  from  the  cos (<ps  —  <p)  hi 


b 


(2+0) 
MF  • 


Conclusion:  The  z-axis  velocity  hydrophone  offers  beam- 
forming  maneuverability  in  elevation.  b(2+l)  is  useless 
as  a  matched-filter  beamformer.  Among  the  four  vector- 
hydrophone  constructions  above,  only  the  four-component 
vector  hydrophone  suffers  no  spurious  peaks.  The  main- 

lobe  in  b^1)  (ip)  may  be  sharpened  by  deploying  multi¬ 
ple  four-component  vector  hydrophones  in  a  spatially  dis¬ 
placed  array.  The  overall  vector  hydrophone  array’s  spatial 
angular  response  equals  the  product  between  (1)  the  indi¬ 
vidual  four-component  vector  hydrophone’s  spatial  angular 
response,  and  (2)the  spatial  angular  response  of  an  array  of 
omnidirectional  sensors  spaced  in  such  a  geometry  [22] . 
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2.2.  Maximum-SINR  Statistically  Optimal  Beam- 
forming 

The  maximum-SINR  beamformer  aims  to  maximize  the  ra¬ 
tio  of  the  desired  signal’s  power  over  the  combined  power 
of  all  interference  and  noise.  A  high  SINR  is  desirable  over 
the  widest  range  of  azimuth  and  elevation  angles.  If  (1) 
the  additive  noise  is  uncorrelated  temporally  and  across  hy¬ 
drophones,  and  if  (2)  all  signals  and  interferers  and  noise 
have  zero  cross-correlation,  the  data  autocorrelation  matrix 
R  z  of  the  input  to  the  beamformer  may  be  modeled  as: 

K 

Rz  —  'Psa.a&-H  +  'Pnl  +  ^  '  Vk&k&-k 

fc=  1 

where  Vs  denotes  the  desired  signal’s  power,  Vn  symbolizes 
the  noise  power,  Vk  refers  to  the  fcth  interferer’s  power,  as 
represents  the  desired  signal’s  steering  vector,  a*,  symbolizes 
the  fcth  interferer’s  steering  vector,  and  K  denotes  the  total 
number  of  interferers.  The  SINR  equals: 

gINR=w^(Pia£af)w 

WH(R-  -  Vs&s&a  )w 

where  w  refers  to  the  beamforming  weight  vector.  The  w 
that  maximizes  the  SINR  equals: 


Construction  #2:  ||ai3+0,||2  =  ||a|3+0)||2  =  1,  and 

||(ai3+0'))Ha;3+0)||  =  cos^/).  Hence, 


SINRo(3+0)  =  ^ 
'Pn 


which  is  also  independent  of  ( Qs,(pa )  and  ($i,<j>i)  per  se, 

CTNno(3+0) 

except  through  ip.  p  ^r/Vn —  is  plotted  in  Figure  5. 

glNRoO+o)  hag  twQ 

minima  at  ip  —  0  and  ip  =  7r;  that 
is,  when  the  desired  source  and  the  interference  impinge 
from  the  same  or  from  diametrically  opposite  directions  of 
arrival.  SINR°^3+0^  is  maximum  at  ip  =  7r/2  and  ip  =  37r/2, 
when  the  desired  source  and  the  interference  impinge  from 
perpendicular  directions  of  arrival.  The  double  maxima  and 
double  minima  mean  that  this  vector-hydrophone  construc¬ 
tion,  without  a  pressure  hydrophone,  suffers  a  180°  hemi¬ 
spherical  ambiguity  in  ip.  This  is  because  the  absence  of  the 
pressure  hydrophone  means  that  acoustical  dilation  cannot 
be  distinguished  from  acoustical  compression. 


Construction  #3:  |ja.£2+1)  ||2  =  1  +  sin20s,  ||a^2+1)||2  = 

l+sin2#,  and  ||(ai2+1))Ha*2+1)  ||  =  1  +  sin08  sin#*  cos(<ps  — 
<l>i)  =  1  +  cos  ip  —  cos  6S  cos  6i.  Hence, 


n  arg  max  „ 

wsinr  =  w  SINR  - 


Unlike  wmf,  w§inr  is  a  function  of  the  collected  data 
through  R2.  With  K  -  1,  define  Vi  =  Vi  and  afc  =  a,. 
Using  the  relation 


(A  +  BCh)“1  =  A"1  -  A_1B(I  +  ChA_1B)_1ChA^1 

and  setting  A  =  Vn\  and  B  =  C  =  Pi  a,,  the  maximum 
SINR  (SINR°)  becomes 


SINR° 


(6) 


Construction  #1:  ||a£3+1)  |j2  =  ||af+1)||2  =  2,  and 

||(a33+1>)Ha^3+1^||  =  1  +  cosip.  Hence, 


SINRo(3+1)  =  — 
Vn 


2  _  (1  +  cosip)2 


which  is  independent  of  (9s,<ps)  and  (0;,  (pi)  per  se,  except 
through  ip.  This  constitutes  a  rotational  invariance  in  the 
spherical  coordinate.  Only  the  angular  separation  between 
the  desired  source’s  and  the  interference’s  arrival  angles, 

not  their  absolute  values,  affects  SINRo(3+1).  — ^ 

Vg  /Vn 

is  plotted  in  Figure  4.  Note  that  SINRo(3+1)  has  a  unique 
minimum  at  ip  =  0  (when  the  desired  source  and  the  in¬ 
terference  impinge  from  the  same  direction  of  arrival).  The 
amplitude  response’s  flat  plateau  peaks  at  ip  =  it,  when  the 
desired  source  and  the  interference  impinge  from  diametri¬ 
cally  opposite  directions  of  arrival. 


SINRd(2+1)  = 


(1  gjn2  0  )  _  (  1  +CQ3  X />  —  CQ3  6g  COS  0j> 

(1+sin2  $i) f  (^-) 


Vs, 

Vn 


(1  +  sin2  0  )  —  (1+sin^  sin6>f.  cos(0„-4 

(l+sin2  0i)+(^-) 


Unlike  SINR°(3+1>  and  SINR°<3+0>,  SINRo(2+1>  depends  on 
9s  and  6i ,  in  addition  to  ip.  The  exists  a  rotational  invari¬ 
ance  only  with  respect  to  z-axis  (i.e.,  the  absolute  values 
of  the  desired  source’s  and  interference’s  azimuth  angles  do 

not  matter,  only  their  difference).  ^NR^21  has  the  fol¬ 
lowing  properties: 

(1)  Nulls  exist  at  «  0  and  9a  f»  n,  for  all  0,,  (ps-(pi  and 
^  •  Hence,  SINR0*21 ' 1  is  useful  only  if  the  interference  is  a 
priori  known  to  impinge  from  near-horizontal  arrival  angles. 

(2)  When  is  less  than  or  roughly  0.1,  de¬ 

pends  only  on  sin2  03  and  roughly  equals  1  +  sin2  0S,  which 
always  exceeds  or  equal  unity.  Figure  6  shows  that  for 

<  0.1,  the  SINR  pattern  is  largely  invariant  with  re¬ 
spect  to  and  \(ps  —  (pi\. 

(3)  For  equals  to  or  exceeds  unity  and  small  |</>s  —(pi\, 
two  additional  nulls  appear  at  9S  fa  0<  and  03  fs  7r  —  0,. 
These  two  nulls,  unlike  those  in  (1),  are  no  longer  7r  radi¬ 
ans  apart.  Hence,  even  if  all  sources  are  a  priori  known 
to  impinge  from  one  hemispherical  side  of  the  vector  hy¬ 
drophone,  interference  may  still  pass  through  unhindered 
when  exceeds  about  0.2. 

r  n 

These  above  properties  imply  that  SINRo(2+1)  works 
well  only  if  the  additive  noise  dominates  the  interfer¬ 
ence  and  only  if  the  desired  signal  is  known  to  impinge 
near-horizontally  from  one  particular  hemispherical  side 
of  the  vector  hydrophone.  Note  that  when  0a  =  Qt  = 
tt/2,  SINRo(2+1>  is  equivalent  to  SINRo(3+1).  That  is, 

SINR°*2+1)  and  SINRo(3+1'  have  the  same  response  on  the 
x-y  plane. 
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Construction  #4:  ||ai2+0^||2  =  sin2#,,,  ||a^  +  ')||2  — 

sin2  9i  and  ||(ai2+0))Haf +0)||  =  sin  6S  sin  0,  cos(0s  -fa)  = 
cos  tp  —  cos  0S  cos  0j.  Hence, 


Vs  2  ^  (CCIS  V’  —  cos  03  C°s  0i)2 

Vn  sin  1  +  sin2  0i) 


SINRo(2+0)  relates  to  0^  as  a  linear  function  of  sin2  6e,  but 
SINRo(2+0)  depends  non-linearly  on  ^  sin2  0  and  cos2  (</>„- 
4>i).  The  non-linear  dependencies  are  plotted  in  Figure  6, 

where  the  z-axis  gives  —  ,  — .  SINRo(2~0'  is  still  char- 

vt Sln  B‘ 

acterized  by  an  azimuth  rotational  invariance.  The  nulls 
of  SINRo(2+0)  lie  at  4>s  =  (pi  and  4 >„  =  <&  +  tt,  when 
El  sin2  0i  »  0  (i.e.,  when  the  x-y  plane  component  of  the 

interferer’s  power  greatly  exceeds  noise  power.)  The  ex¬ 
tra  null  region  at  <j>a  =  4>i  +  w  arises  from  the  absence 
of  the  pressure  hydrophone  like  the  case  with  SINR°^3+0\ 
allowing  interference  to  pass  through  unhindered.  When 
0S  =  9,  =  7t/2,  SINRo(2+0)  is  equivalent  to  SINRo(3+0). 
That  is,  SINRo(2+0)  and  SINR°(3+0)  have  the  same  response 
on  the  x-y  plane. 


Conclusion:  Only  SINRo(3+1)  suffers  no  spurious  null.  If 
interference  is  a  priori  known  to  impinge  from  one  particu¬ 
lar  hemispherical  side  of  the  vector  hydrophone,  SINR"-3^0* 
will  be  equally  usable  as  SINRo(3+1).  Without  the  z-axis  ve¬ 
locity  hydrophone,  SINRo(2+l  *  and  SINR°^2  t  °*  cannot  re¬ 
ject  near-vertical  interference.  The  rather  irregular  beam 
pattern  of  SINRo(2+1)  renders  it  relatively  useless. 

3.  OVERALL  CONCLUSIONS 

The  four-component  vector  hydrophone  offers  a  unimodal 
mainlobe  in  matched-filter  beamforming  and  the  broadest 
peak  in  maximum-SINR  beamforming,  with  full  maneu¬ 
verability  in  elevation  in  addition  to  azimuth.  However, 
if  the  incident  sources  are  known  to  impinge  from  only 
one  particular  hemispherical  side  of  the  vector  hydrophone, 
SINRo(2+0)  and  SINRo(3+0)  also  offer  a  unimodal  main- 
lobe  in  matched-filter  beamforming  and  the  a  broad  peak 
in  maximum-SINR  beamforming.  SINR°(2+0)  is  especially 
useful  for  the  case  where  the  vertical  oceanic  acoustics  need 
to  be  overlooked  in  order  for  the  rectilinear  model  to  be 
valid.  In  contrast,  SINR°(2+1)  produces  an  essentially  use¬ 
less  matched-filter  beamforming  pattern  and  an  unreliable 
maximum-SINR  beamforming  pattern. 
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